[PATCH 5.15 00/92] 5.15.126-rc1 review

Greg Kroah-Hartman posted 92 patches 2 years, 1 month ago
Only 0 patches received!
Documentation/arm64/silicon-errata.rst             |  12 +
Makefile                                           |   4 +-
arch/arm64/Kconfig                                 |  74 ++
.../boot/dts/altera/socfpga_stratix10_socdk.dts    |   2 +-
.../dts/altera/socfpga_stratix10_socdk_nand.dts    |   2 +-
arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi  |   2 +-
arch/arm64/include/asm/barrier.h                   |  16 +-
arch/arm64/kernel/cpu_errata.c                     |  39 +
arch/arm64/tools/cpucaps                           |   2 +
arch/powerpc/include/asm/word-at-a-time.h          |   2 +-
arch/powerpc/mm/init_64.c                          |   3 +-
arch/s390/kernel/sthyi.c                           |   6 +-
arch/s390/kvm/intercept.c                          |   9 +-
drivers/base/power/power.h                         |   8 +-
drivers/base/power/runtime.c                       |   6 +-
drivers/base/power/wakeirq.c                       | 111 ++-
drivers/block/rbd.c                                |  28 +-
drivers/firmware/arm_scmi/mailbox.c                |   4 +-
drivers/firmware/arm_scmi/smc.c                    |  21 +-
drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c        |   8 +-
drivers/gpu/drm/imx/ipuv3-crtc.c                   |   2 +-
drivers/gpu/drm/ttm/ttm_bo.c                       |   3 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c        |  50 ++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h        |   8 +
drivers/isdn/hardware/mISDN/hfcpci.c               |  10 +-
drivers/mtd/nand/raw/fsl_upm.c                     |   2 +-
drivers/mtd/nand/raw/meson_nand.c                  |   3 +-
drivers/mtd/nand/raw/omap_elm.c                    |  24 +-
drivers/mtd/nand/raw/rockchip-nand-controller.c    |  45 +-
drivers/mtd/nand/spi/toshiba.c                     |   4 +-
drivers/net/dsa/bcm_sf2.c                          |   8 +-
drivers/net/ethernet/korina.c                      |   3 +-
.../net/ethernet/marvell/prestera/prestera_pci.c   |   3 +-
.../mellanox/mlx5/core/en_accel/ipsec_rxtx.c       |   4 +-
drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 105 ++-
drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h |   1 +
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  29 +
.../ethernet/mellanox/mlx5/core/steering/dr_cmd.c  |   5 +-
drivers/net/ethernet/qlogic/qed/qed.h              |   9 +-
drivers/net/ethernet/qlogic/qed/qed_cxt.h          | 138 +--
drivers/net/ethernet/qlogic/qed/qed_dev_api.h      | 361 ++++----
drivers/net/ethernet/qlogic/qed/qed_fcoe.c         |  19 +-
drivers/net/ethernet/qlogic/qed/qed_fcoe.h         |  17 +-
drivers/net/ethernet/qlogic/qed/qed_hsi.h          | 922 +++++++++++----------
drivers/net/ethernet/qlogic/qed/qed_hw.c           |  26 +-
drivers/net/ethernet/qlogic/qed/qed_hw.h           | 214 ++---
drivers/net/ethernet/qlogic/qed/qed_init_ops.h     |  58 +-
drivers/net/ethernet/qlogic/qed/qed_int.h          | 274 +++---
drivers/net/ethernet/qlogic/qed/qed_iscsi.c        |  19 +-
drivers/net/ethernet/qlogic/qed/qed_iscsi.h        |  17 +-
drivers/net/ethernet/qlogic/qed/qed_l2.c           |  19 +-
drivers/net/ethernet/qlogic/qed/qed_l2.h           | 158 ++--
drivers/net/ethernet/qlogic/qed/qed_ll2.h          | 130 +--
drivers/net/ethernet/qlogic/qed/qed_main.c         |   6 +-
drivers/net/ethernet/qlogic/qed/qed_mcp.h          | 757 +++++++++--------
drivers/net/ethernet/qlogic/qed/qed_selftest.h     |  30 +-
drivers/net/ethernet/qlogic/qed/qed_sp.h           | 215 +++--
drivers/net/ethernet/qlogic/qed/qed_sriov.h        |  99 ++-
drivers/net/ethernet/qlogic/qed/qed_vf.h           | 301 ++++---
drivers/net/ethernet/qlogic/qede/qede_main.c       |   5 +-
drivers/net/ethernet/socionext/netsec.c            |  11 +
drivers/net/ethernet/xilinx/ll_temac_main.c        |  16 +-
drivers/net/tap.c                                  |   2 +-
drivers/net/tun.c                                  |   2 +-
drivers/net/usb/cdc_ether.c                        |  21 +
drivers/net/usb/usbnet.c                           |   6 +
drivers/net/usb/zaurus.c                           |  21 +
drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c |   6 +-
drivers/s390/net/qeth_core.h                       |   1 -
drivers/s390/net/qeth_core_main.c                  |   2 -
drivers/s390/net/qeth_l2_main.c                    |   9 +-
drivers/s390/net/qeth_l3_main.c                    |   8 +-
drivers/s390/scsi/zfcp_fc.c                        |   6 +-
drivers/scsi/storvsc_drv.c                         |   4 +
drivers/soundwire/bus.c                            |  20 +-
fs/ceph/mds_client.c                               |   4 +-
fs/ceph/mds_client.h                               |   5 +
fs/ceph/super.c                                    |  10 +
fs/exfat/balloc.c                                  |   6 +-
fs/exfat/dir.c                                     |  27 +-
fs/ext2/ext2.h                                     |  12 -
fs/ext2/super.c                                    |  23 +-
fs/file.c                                          |  18 +-
fs/ntfs3/attrlist.c                                |   4 +-
fs/open.c                                          |   2 +-
fs/super.c                                         |  11 +-
fs/sysv/itree.c                                    |   4 +
include/asm-generic/word-at-a-time.h               |   2 +-
include/linux/pm_wakeirq.h                         |   9 +-
include/linux/qed/qed_chain.h                      |  97 ++-
include/linux/qed/qed_if.h                         | 255 +++---
include/linux/qed/qed_iscsi_if.h                   |   2 +-
include/linux/qed/qed_ll2_if.h                     |  42 +-
include/linux/qed/qed_nvmetcp_if.h                 |  17 +
include/net/vxlan.h                                |   4 +-
io_uring/io_uring.c                                |  23 +-
kernel/bpf/cpumap.c                                |  35 +-
kernel/events/core.c                               |   8 +-
kernel/trace/bpf_trace.c                           |   6 +-
net/bluetooth/l2cap_sock.c                         |   2 +
net/ceph/osd_client.c                              |  20 +-
net/core/bpf_sk_storage.c                          |   5 +-
net/core/rtnetlink.c                               |   8 +-
net/core/sock.c                                    |  21 +-
net/core/sock_map.c                                |   2 -
net/dcb/dcbnl.c                                    |   2 +-
net/ipv4/tcp_metrics.c                             |  70 +-
net/ipv6/ip6mr.c                                   |   2 +-
net/sched/cls_fw.c                                 |   1 -
net/sched/cls_route.c                              |   1 -
net/sched/cls_u32.c                                |  57 +-
net/sched/sch_taprio.c                             |  15 +-
net/unix/af_unix.c                                 |   2 +-
net/wireless/scan.c                                |   2 +-
.../tests/shell/test_uprobe_from_different_cu.sh   |   8 +-
tools/testing/selftests/rseq/rseq.c                |  31 +-
117 files changed, 3227 insertions(+), 2247 deletions(-)
[PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Greg Kroah-Hartman 2 years, 1 month ago
This is the start of the stable review cycle for the 5.15.126 release.
There are 92 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 5.15.126-rc1

Johan Hovold <johan+linaro@kernel.org>
    PM: sleep: wakeirq: fix wake irq arming

Chunfeng Yun <chunfeng.yun@mediatek.com>
    PM / wakeirq: support enabling wake-up irq after runtime_suspend called

Johan Hovold <johan+linaro@kernel.org>
    soundwire: fix enumeration completion

Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    soundwire: bus: pm_runtime_request_resume on peripheral attachment

Sean Christopherson <seanjc@google.com>
    selftests/rseq: Play nice with binaries statically linked against glibc 2.35+

Michael Jeanson <mjeanson@efficios.com>
    selftests/rseq: check if libc rseq support is registered

Alexander Stein <alexander.stein@ew.tq-group.com>
    drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning

Thomas Zimmermann <tzimmermann@suse.de>
    drm/fsl-dcu: Use drm_plane_helper_destroy()

Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    powerpc/mm/altmap: Fix altmap boundary check

Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()

Johan Jonker <jbx6244@gmail.com>
    mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts

Johan Jonker <jbx6244@gmail.com>
    mtd: rawnand: rockchip: fix oobfree offset and description

Roger Quadros <rogerq@kernel.org>
    mtd: rawnand: omap_elm: Fix incorrect type in assignment

Jan Kara <jack@suse.cz>
    ext2: Drop fragment support

Jan Kara <jack@suse.cz>
    fs: Protect reconfiguration of sb read-write from racing writes

Alan Stern <stern@rowland.harvard.edu>
    net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb

Sungwoo Kim <iam@sung-woo.kim>
    Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb

Prince Kumar Maurya <princekumarmaurya06@gmail.com>
    fs/sysv: Null check to prevent null-ptr-deref bug

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()

Linus Torvalds <torvalds@linux-foundation.org>
    file: reinstate f_pos locking optimization for regular files

Hou Tao <houtao1@huawei.com>
    bpf, cpumap: Make sure kthread is running before map update returns

Guchun Chen <guchun.chen@amd.com>
    drm/ttm: check null pointer before accessing when swapping

Aleksa Sarai <cyphar@cyphar.com>
    open: make RESOLVE_CACHED correctly test for O_TMPFILE

Jiri Olsa <jolsa@kernel.org>
    bpf: Disable preemption in bpf_event_output

Ilya Dryomov <idryomov@gmail.com>
    rbd: prevent busy loop when requesting exclusive lock

Paul Fertser <fercerpav@gmail.com>
    wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)

Laszlo Ersek <lersek@redhat.com>
    net: tap_open(): set sk_uid from current_fsuid()

Laszlo Ersek <lersek@redhat.com>
    net: tun_chr_open(): set sk_uid from current_fsuid()

Dinh Nguyen <dinguyen@kernel.org>
    arm64: dts: stratix10: fix incorrect I2C property for SCL signal

Arseniy Krasnov <AVKrasnov@sberdevices.ru>
    mtd: rawnand: meson: fix OOB available bytes for ECC

Olivier Maignial <olivier.maignial@hotmail.fr>
    mtd: spinand: toshiba: Fix ecc_get_status

Sungjong Seo <sj1557.seo@samsung.com>
    exfat: release s_lock before calling dir_emit()

gaoming <gaoming20@hihonor.com>
    exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree

Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    firmware: arm_scmi: Drop OF node reference in the transport channel setup

Xiubo Li <xiubli@redhat.com>
    ceph: defer stopping mdsc delayed_work

Ross Maynard <bids.7405@bigpond.com>
    USB: zaurus: Add ID for A-300/B-500/C-700

Ilya Dryomov <idryomov@gmail.com>
    libceph: fix potential hang in ceph_osdc_notify()

Michael Kelley <mikelley@microsoft.com>
    scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices

Steffen Maier <maier@linux.ibm.com>
    scsi: zfcp: Defer fc_rport blocking until after ADISC response

Eric Dumazet <edumazet@google.com>
    tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen

Eric Dumazet <edumazet@google.com>
    tcp_metrics: annotate data-races around tm->tcpm_net

Eric Dumazet <edumazet@google.com>
    tcp_metrics: annotate data-races around tm->tcpm_vals[]

Eric Dumazet <edumazet@google.com>
    tcp_metrics: annotate data-races around tm->tcpm_lock

Eric Dumazet <edumazet@google.com>
    tcp_metrics: annotate data-races around tm->tcpm_stamp

Eric Dumazet <edumazet@google.com>
    tcp_metrics: fix addr_same() helper

Jonas Gorski <jonas.gorski@bisdn.de>
    prestera: fix fallback to previous version on same major version

Jianbo Liu <jianbol@nvidia.com>
    net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio

Jianbo Liu <jianbol@nvidia.com>
    net/mlx5: fs_core: Make find_closest_ft more generic

Benjamin Poirier <bpoirier@nvidia.com>
    vxlan: Fix nexthop hash size

Yue Haibing <yuehaibing@huawei.com>
    ip6mr: Fix skb_under_panic in ip6mr_cache_report()

Alexandra Winter <wintera@linux.ibm.com>
    s390/qeth: Don't call dev_close/dev_open (DOWN/UP)

Lin Ma <linma@zju.edu.cn>
    net: dcb: choose correct policy to parse DCB_ATTR_BCN

Mark Brown <broonie@kernel.org>
    net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode

Yuanjun Gong <ruc_gongyuanjun@163.com>
    net: korina: handle clk prepare error in korina_probe()

Dan Carpenter <dan.carpenter@linaro.org>
    net: ll_temac: fix error checking of irq_of_parse_and_map()

Yang Yingliang <yangyingliang@huawei.com>
    net: ll_temac: Switch to use dev_err_probe() helper

Tomas Glozar <tglozar@redhat.com>
    bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire

valis <sec@valis.email>
    net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free

valis <sec@valis.email>
    net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free

valis <sec@valis.email>
    net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free

Hou Tao <houtao1@huawei.com>
    bpf, cpumap: Handle skb as well when clean up ptr_ring

Kuniyuki Iwashima <kuniyu@amazon.com>
    net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.

Eric Dumazet <edumazet@google.com>
    net: add missing data-race annotation for sk_ll_usec

Eric Dumazet <edumazet@google.com>
    net: add missing data-race annotations around sk->sk_peek_off

Eric Dumazet <edumazet@google.com>
    net: add missing READ_ONCE(sk->sk_rcvbuf) annotation

Eric Dumazet <edumazet@google.com>
    net: add missing READ_ONCE(sk->sk_sndbuf) annotation

Eric Dumazet <edumazet@google.com>
    net: add missing READ_ONCE(sk->sk_rcvlowat) annotation

Eric Dumazet <edumazet@google.com>
    net: annotate data-races around sk->sk_max_pacing_rate

Konstantin Khorenko <khorenko@virtuozzo.com>
    qed: Fix scheduling in a tasklet while getting stats

Prabhakar Kushwaha <pkushwaha@marvell.com>
    qed: Fix kernel-doc warnings

Chengfeng Ye <dg573847474@gmail.com>
    mISDN: hfcpci: Fix potential deadlock on &hc->lock

Jamal Hadi Salim <jhs@mojatatu.com>
    net: sched: cls_u32: Fix match key mis-addressing

Georg Müller <georgmueller@gmx.net>
    perf test uprobe_from_different_cu: Skip if there is no gcc

Yuanjun Gong <ruc_gongyuanjun@163.com>
    net: dsa: fix value check in bcm_sf2_sw_probe()

Lin Ma <linma@zju.edu.cn>
    rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length

Lin Ma <linma@zju.edu.cn>
    bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing

Yuanjun Gong <ruc_gongyuanjun@163.com>
    net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()

Zhengchao Shao <shaozhengchao@huawei.com>
    net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx

Ilan Peer <ilan.peer@intel.com>
    wifi: cfg80211: Fix return value in scan logic

Heiko Carstens <hca@linux.ibm.com>
    KVM: s390: fix sthyi error handling

ndesaulniers@google.com <ndesaulniers@google.com>
    word-at-a-time: use the same return type for has_zero regardless of endianness

Cristian Marussi <cristian.marussi@arm.com>
    firmware: arm_scmi: Fix chan_free cleanup on SMC

Hugo Villeneuve <hvilleneuve@dimonoff.com>
    arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux

Robin Murphy <robin.murphy@arm.com>
    iommu/arm-smmu-v3: Document nesting-related errata

Robin Murphy <robin.murphy@arm.com>
    iommu/arm-smmu-v3: Add explicit feature for nesting

Robin Murphy <robin.murphy@arm.com>
    iommu/arm-smmu-v3: Document MMU-700 erratum 2812531

Robin Murphy <robin.murphy@arm.com>
    iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982

Suzuki K Poulose <suzuki.poulose@arm.com>
    arm64: errata: Add detection for TRBE write to out-of-range

Suzuki K Poulose <suzuki.poulose@arm.com>
    arm64: errata: Add workaround for TSB flush failures

Shay Drory <shayd@nvidia.com>
    net/mlx5: Free irqs only on shutdown callback

Peter Zijlstra <peterz@infradead.org>
    perf: Fix function pointer case

Jens Axboe <axboe@kernel.dk>
    io_uring: gate iowait schedule on having pending requests


-------------

Diffstat:

 Documentation/arm64/silicon-errata.rst             |  12 +
 Makefile                                           |   4 +-
 arch/arm64/Kconfig                                 |  74 ++
 .../boot/dts/altera/socfpga_stratix10_socdk.dts    |   2 +-
 .../dts/altera/socfpga_stratix10_socdk_nand.dts    |   2 +-
 arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi  |   2 +-
 arch/arm64/include/asm/barrier.h                   |  16 +-
 arch/arm64/kernel/cpu_errata.c                     |  39 +
 arch/arm64/tools/cpucaps                           |   2 +
 arch/powerpc/include/asm/word-at-a-time.h          |   2 +-
 arch/powerpc/mm/init_64.c                          |   3 +-
 arch/s390/kernel/sthyi.c                           |   6 +-
 arch/s390/kvm/intercept.c                          |   9 +-
 drivers/base/power/power.h                         |   8 +-
 drivers/base/power/runtime.c                       |   6 +-
 drivers/base/power/wakeirq.c                       | 111 ++-
 drivers/block/rbd.c                                |  28 +-
 drivers/firmware/arm_scmi/mailbox.c                |   4 +-
 drivers/firmware/arm_scmi/smc.c                    |  21 +-
 drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c        |   8 +-
 drivers/gpu/drm/imx/ipuv3-crtc.c                   |   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c                       |   3 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c        |  50 ++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h        |   8 +
 drivers/isdn/hardware/mISDN/hfcpci.c               |  10 +-
 drivers/mtd/nand/raw/fsl_upm.c                     |   2 +-
 drivers/mtd/nand/raw/meson_nand.c                  |   3 +-
 drivers/mtd/nand/raw/omap_elm.c                    |  24 +-
 drivers/mtd/nand/raw/rockchip-nand-controller.c    |  45 +-
 drivers/mtd/nand/spi/toshiba.c                     |   4 +-
 drivers/net/dsa/bcm_sf2.c                          |   8 +-
 drivers/net/ethernet/korina.c                      |   3 +-
 .../net/ethernet/marvell/prestera/prestera_pci.c   |   3 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c       |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 105 ++-
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  29 +
 .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c  |   5 +-
 drivers/net/ethernet/qlogic/qed/qed.h              |   9 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h          | 138 +--
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h      | 361 ++++----
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c         |  19 +-
 drivers/net/ethernet/qlogic/qed/qed_fcoe.h         |  17 +-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h          | 922 +++++++++++----------
 drivers/net/ethernet/qlogic/qed/qed_hw.c           |  26 +-
 drivers/net/ethernet/qlogic/qed/qed_hw.h           | 214 ++---
 drivers/net/ethernet/qlogic/qed/qed_init_ops.h     |  58 +-
 drivers/net/ethernet/qlogic/qed/qed_int.h          | 274 +++---
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c        |  19 +-
 drivers/net/ethernet/qlogic/qed/qed_iscsi.h        |  17 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c           |  19 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.h           | 158 ++--
 drivers/net/ethernet/qlogic/qed/qed_ll2.h          | 130 +--
 drivers/net/ethernet/qlogic/qed/qed_main.c         |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.h          | 757 +++++++++--------
 drivers/net/ethernet/qlogic/qed/qed_selftest.h     |  30 +-
 drivers/net/ethernet/qlogic/qed/qed_sp.h           | 215 +++--
 drivers/net/ethernet/qlogic/qed/qed_sriov.h        |  99 ++-
 drivers/net/ethernet/qlogic/qed/qed_vf.h           | 301 ++++---
 drivers/net/ethernet/qlogic/qede/qede_main.c       |   5 +-
 drivers/net/ethernet/socionext/netsec.c            |  11 +
 drivers/net/ethernet/xilinx/ll_temac_main.c        |  16 +-
 drivers/net/tap.c                                  |   2 +-
 drivers/net/tun.c                                  |   2 +-
 drivers/net/usb/cdc_ether.c                        |  21 +
 drivers/net/usb/usbnet.c                           |   6 +
 drivers/net/usb/zaurus.c                           |  21 +
 drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c |   6 +-
 drivers/s390/net/qeth_core.h                       |   1 -
 drivers/s390/net/qeth_core_main.c                  |   2 -
 drivers/s390/net/qeth_l2_main.c                    |   9 +-
 drivers/s390/net/qeth_l3_main.c                    |   8 +-
 drivers/s390/scsi/zfcp_fc.c                        |   6 +-
 drivers/scsi/storvsc_drv.c                         |   4 +
 drivers/soundwire/bus.c                            |  20 +-
 fs/ceph/mds_client.c                               |   4 +-
 fs/ceph/mds_client.h                               |   5 +
 fs/ceph/super.c                                    |  10 +
 fs/exfat/balloc.c                                  |   6 +-
 fs/exfat/dir.c                                     |  27 +-
 fs/ext2/ext2.h                                     |  12 -
 fs/ext2/super.c                                    |  23 +-
 fs/file.c                                          |  18 +-
 fs/ntfs3/attrlist.c                                |   4 +-
 fs/open.c                                          |   2 +-
 fs/super.c                                         |  11 +-
 fs/sysv/itree.c                                    |   4 +
 include/asm-generic/word-at-a-time.h               |   2 +-
 include/linux/pm_wakeirq.h                         |   9 +-
 include/linux/qed/qed_chain.h                      |  97 ++-
 include/linux/qed/qed_if.h                         | 255 +++---
 include/linux/qed/qed_iscsi_if.h                   |   2 +-
 include/linux/qed/qed_ll2_if.h                     |  42 +-
 include/linux/qed/qed_nvmetcp_if.h                 |  17 +
 include/net/vxlan.h                                |   4 +-
 io_uring/io_uring.c                                |  23 +-
 kernel/bpf/cpumap.c                                |  35 +-
 kernel/events/core.c                               |   8 +-
 kernel/trace/bpf_trace.c                           |   6 +-
 net/bluetooth/l2cap_sock.c                         |   2 +
 net/ceph/osd_client.c                              |  20 +-
 net/core/bpf_sk_storage.c                          |   5 +-
 net/core/rtnetlink.c                               |   8 +-
 net/core/sock.c                                    |  21 +-
 net/core/sock_map.c                                |   2 -
 net/dcb/dcbnl.c                                    |   2 +-
 net/ipv4/tcp_metrics.c                             |  70 +-
 net/ipv6/ip6mr.c                                   |   2 +-
 net/sched/cls_fw.c                                 |   1 -
 net/sched/cls_route.c                              |   1 -
 net/sched/cls_u32.c                                |  57 +-
 net/sched/sch_taprio.c                             |  15 +-
 net/unix/af_unix.c                                 |   2 +-
 net/wireless/scan.c                                |   2 +-
 .../tests/shell/test_uprobe_from_different_cu.sh   |   8 +-
 tools/testing/selftests/rseq/rseq.c                |  31 +-
 117 files changed, 3227 insertions(+), 2247 deletions(-)


Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Daniel Díaz 2 years, 1 month ago
Hello!

On Wed, 9 Aug 2023 at 04:57, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
> -------------

We are also seeing build failures on Arm and Arm64, with Clang 17 and GCC 8:

* arm, build
  - clang-17-defconfig
  - clang-17-lkftconfig
  - clang-17-lkftconfig-no-kselftest-frag
  - clang-lkftconfig
  - clang-nightly-lkftconfig-kselftest
  - gcc-8-defconfig

* arm64, build
  - clang-17-defconfig
  - clang-17-defconfig-40bc7ee5
  - clang-17-lkftconfig
  - clang-17-lkftconfig-no-kselftest-frag
  - clang-lkftconfig
  - clang-nightly-lkftconfig-kselftest
  - gcc-8-defconfig
  - gcc-8-defconfig-40bc7ee5

Failure is:

-----8<-----
/builds/linux/drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate
member 'irq'
   39 |         int irq;
      |             ^~~
/builds/linux/drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
/builds/linux/drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq'
undeclared (first use in this function); did you mean 'rq'?
  118 |                 scmi_info->irq = irq;
      |                                  ^~~
      |                                  rq
----->8-----

(Funnily enough, this was reported by Naresh [1] before this RC round,
but we chalked it up to GCC-13 on an older branch.)

Greetings!

Daniel Díaz
daniel.diaz@linaro.org

[1] https://lore.kernel.org/stable/CA+G9fYvTjm2oa6mXR=HUe6gYuVaS2nFb_otuvPfmPeKHDoC+Tw@mail.gmail.com/
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Ron Economos 2 years, 1 month ago
On 8/9/23 3:40 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <re@w6rz.net>
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 

Build results:
	total: 160 pass: 157 fail: 3
Failed builds:
	arm:allmodconfig
	arm64:defconfig
	arm64:allmodconfig
Qemu test results:
	total: 501 pass: 423 fail: 78
Failed tests:
	<most arm>
	<all arm64/arm64be>

As already reported, plus:

Error log:
drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c:176:20: error: 'drm_plane_helper_destroy' undeclared here

for arm:multi_v7_defconfig

Side note: I am surprised about successful arm64 tests/builds
since arm64:defconfig fails to build with obvious code errors.

drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'

drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared

Guenter
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Greg Kroah-Hartman 2 years, 1 month ago
On Thu, Aug 10, 2023 at 09:06:01AM -0700, Guenter Roeck wrote:
> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.15.126 release.
> > There are 92 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
> 	total: 160 pass: 157 fail: 3
> Failed builds:
> 	arm:allmodconfig
> 	arm64:defconfig
> 	arm64:allmodconfig
> Qemu test results:
> 	total: 501 pass: 423 fail: 78
> Failed tests:
> 	<most arm>
> 	<all arm64/arm64be>
> 
> As already reported, plus:
> 
> Error log:
> drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c:176:20: error: 'drm_plane_helper_destroy' undeclared here

Offending commit now dropped, Sasha's dep-bot went a little crazy there,
and this wasn't needed, sorry for not catching that sooner.

> for arm:multi_v7_defconfig
> 
> Side note: I am surprised about successful arm64 tests/builds
> since arm64:defconfig fails to build with obvious code errors.
> 
> drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'
> 
> drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
> drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared

Should now be fixed, thanks.

greg k-h
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/9/23 03:40, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 
Building arm:allmodconfig ... failed
--------------
Error log:
drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'

drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared

Building arm64:defconfig ... failed
--------------
Error log:

drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'

drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared

That is because commit d80e159dbdbb ("firmware: arm_scmi: Fix chan
free cleanup on SMC") is applied without its dependent commit(s).

Guenter
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Florian Fainelli 2 years, 1 month ago
On 8/10/23 03:24, Guenter Roeck wrote:
> On 8/9/23 03:40, Greg Kroah-Hartman wrote:
>> This is the start of the stable review cycle for the 5.15.126 release.
>> There are 92 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>> Anything received after that time might be too late.
>>
> Building arm:allmodconfig ... failed
> --------------
> Error log:
> drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'
> 
> drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
> drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared
> 
> Building arm64:defconfig ... failed
> --------------
> Error log:
> 
> drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'
> 
> drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
> drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared
> 
> That is because commit d80e159dbdbb ("firmware: arm_scmi: Fix chan
> free cleanup on SMC") is applied without its dependent commit(s).

Indeed, we discussed this here: 
https://lore.kernel.org/all/20230810084529.53thk6dmlejbma3t@bogus/
-- 
Florian

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Greg Kroah-Hartman 2 years, 1 month ago
On Thu, Aug 10, 2023 at 09:25:53AM -0700, Florian Fainelli wrote:
> On 8/10/23 03:24, Guenter Roeck wrote:
> > On 8/9/23 03:40, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 5.15.126 release.
> > > There are 92 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > Anything received after that time might be too late.
> > > 
> > Building arm:allmodconfig ... failed
> > --------------
> > Error log:
> > drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'
> > 
> > drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
> > drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared
> > 
> > Building arm64:defconfig ... failed
> > --------------
> > Error log:
> > 
> > drivers/firmware/arm_scmi/smc.c:39:13: error: duplicate member 'irq'
> > 
> > drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
> > drivers/firmware/arm_scmi/smc.c:118:34: error: 'irq' undeclared
> > 
> > That is because commit d80e159dbdbb ("firmware: arm_scmi: Fix chan
> > free cleanup on SMC") is applied without its dependent commit(s).
> 
> Indeed, we discussed this here:
> https://lore.kernel.org/all/20230810084529.53thk6dmlejbma3t@bogus/

Offending commit should now be dropped, thanks.

greg k-h
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Harshit Mogalapalli 2 years, 1 month ago
Hi Greg,

On 09/08/23 4:10 pm, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 
No problems seen on x86_64 and aarch64.

Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>

Thanks,
Harshit

> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
>
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/10/23 03:16, Harshit Mogalapalli wrote:
> Hi Greg,
> 
> On 09/08/23 4:10 pm, Greg Kroah-Hartman wrote:
>> This is the start of the stable review cycle for the 5.15.126 release.
>> There are 92 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>> Anything received after that time might be too late.
>>
> No problems seen on x86_64 and aarch64.
> 

fwiw, aarch64:allmodconfig doesn't compile.

Guenter

> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> 
> Thanks,
> Harshit
> 
>> The whole patch series can be found in one patch at:
>>     https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>> or in the git tree and branch at:
>>     git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>>

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Florian Fainelli 2 years, 1 month ago
On 8/9/23 03:40, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

SCMI with the SMC transport fails to build because of "[PATCH 5.15 
11/92] firmware: arm_scmi: Fix chan_free cleanup on SMC" where the 
specific details have been reported there. Here is the build failure FWIW:

drivers/firmware/arm_scmi/smc.c:39:6: error: duplicate member 'irq'
   int irq;
       ^~~
drivers/firmware/arm_scmi/smc.c: In function 'smc_chan_setup':
drivers/firmware/arm_scmi/smc.c:118:20: error: 'irq' undeclared (first 
use in this function); did you mean 'rq'?
    scmi_info->irq = irq;
                     ^~~
                     rq
drivers/firmware/arm_scmi/smc.c:118:20: note: each undeclared identifier 
is reported only once for each function it appears in
   CC      drivers/mmc/core/slot-gpio.o
host-make[5]: *** [scripts/Makefile.build:289: 
drivers/firmware/arm_scmi/smc.o] Error 1
host-make[5]: *** Waiting for unfinished jobs....
-- 
Florian
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by SeongJae Park 2 years, 1 month ago
Hello,

On 2023-08-09T12:40:36+02:00   Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.

This rc kernel passes DAMON functionality test[1] on my test machine.
Attaching the test results summary below.  Please note that I retrieved the
kernel from linux-stable-rc tree[2].

Tested-by: SeongJae Park <sj@kernel.org>

[1] https://github.com/awslabs/damon-tests/tree/next/corr
[2] ae7f23cbf199 ("Linux 5.15.126-rc1")


Thanks,
SJ

[...]

---

ok 1 selftests: damon: debugfs_attrs.sh
ok 1 selftests: damon-tests: kunit.sh
ok 2 selftests: damon-tests: huge_count_read_write.sh
ok 3 selftests: damon-tests: buffer_overflow.sh
ok 4 selftests: damon-tests: rm_contexts.sh
ok 5 selftests: damon-tests: record_null_deref.sh
ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh
ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh
ok 8 selftests: damon-tests: damo_tests.sh
ok 9 selftests: damon-tests: masim-record.sh
ok 10 selftests: damon-tests: build_i386.sh
ok 11 selftests: damon-tests: build_m68k.sh
ok 12 selftests: damon-tests: build_arm64.sh
ok 13 selftests: damon-tests: build_i386_idle_flag.sh
ok 14 selftests: damon-tests: build_i386_highpte.sh
ok 15 selftests: damon-tests: build_nomemcg.sh

PASS
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.126 release.
> There are 92 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.

Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
hang with this -rc: TREE04, TREE07, TASKS03.

5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
hotplug rcutorture testing. Me and tglx are continuing to debug this. The
issue does not show up on anything but 5.15 stable kernels and neither on
mainline.

I will do some more runs to see if TASKS03 hang is a new thing but it could
be related to the existing issues.

thanks,

 - Joel



> 
> thanks,
> 
> greg k-h
> 
> -------------
> Pseudo-Shortlog of commits:
> 
> Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>     Linux 5.15.126-rc1
> 
> Johan Hovold <johan+linaro@kernel.org>
>     PM: sleep: wakeirq: fix wake irq arming
> 
> Chunfeng Yun <chunfeng.yun@mediatek.com>
>     PM / wakeirq: support enabling wake-up irq after runtime_suspend called
> 
> Johan Hovold <johan+linaro@kernel.org>
>     soundwire: fix enumeration completion
> 
> Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>     soundwire: bus: pm_runtime_request_resume on peripheral attachment
> 
> Sean Christopherson <seanjc@google.com>
>     selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
> 
> Michael Jeanson <mjeanson@efficios.com>
>     selftests/rseq: check if libc rseq support is registered
> 
> Alexander Stein <alexander.stein@ew.tq-group.com>
>     drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
> 
> Thomas Zimmermann <tzimmermann@suse.de>
>     drm/fsl-dcu: Use drm_plane_helper_destroy()
> 
> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>     powerpc/mm/altmap: Fix altmap boundary check
> 
> Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>     mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
> 
> Johan Jonker <jbx6244@gmail.com>
>     mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
> 
> Johan Jonker <jbx6244@gmail.com>
>     mtd: rawnand: rockchip: fix oobfree offset and description
> 
> Roger Quadros <rogerq@kernel.org>
>     mtd: rawnand: omap_elm: Fix incorrect type in assignment
> 
> Jan Kara <jack@suse.cz>
>     ext2: Drop fragment support
> 
> Jan Kara <jack@suse.cz>
>     fs: Protect reconfiguration of sb read-write from racing writes
> 
> Alan Stern <stern@rowland.harvard.edu>
>     net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
> 
> Sungwoo Kim <iam@sung-woo.kim>
>     Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
> 
> Prince Kumar Maurya <princekumarmaurya06@gmail.com>
>     fs/sysv: Null check to prevent null-ptr-deref bug
> 
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>     fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
> 
> Linus Torvalds <torvalds@linux-foundation.org>
>     file: reinstate f_pos locking optimization for regular files
> 
> Hou Tao <houtao1@huawei.com>
>     bpf, cpumap: Make sure kthread is running before map update returns
> 
> Guchun Chen <guchun.chen@amd.com>
>     drm/ttm: check null pointer before accessing when swapping
> 
> Aleksa Sarai <cyphar@cyphar.com>
>     open: make RESOLVE_CACHED correctly test for O_TMPFILE
> 
> Jiri Olsa <jolsa@kernel.org>
>     bpf: Disable preemption in bpf_event_output
> 
> Ilya Dryomov <idryomov@gmail.com>
>     rbd: prevent busy loop when requesting exclusive lock
> 
> Paul Fertser <fercerpav@gmail.com>
>     wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
> 
> Laszlo Ersek <lersek@redhat.com>
>     net: tap_open(): set sk_uid from current_fsuid()
> 
> Laszlo Ersek <lersek@redhat.com>
>     net: tun_chr_open(): set sk_uid from current_fsuid()
> 
> Dinh Nguyen <dinguyen@kernel.org>
>     arm64: dts: stratix10: fix incorrect I2C property for SCL signal
> 
> Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>     mtd: rawnand: meson: fix OOB available bytes for ECC
> 
> Olivier Maignial <olivier.maignial@hotmail.fr>
>     mtd: spinand: toshiba: Fix ecc_get_status
> 
> Sungjong Seo <sj1557.seo@samsung.com>
>     exfat: release s_lock before calling dir_emit()
> 
> gaoming <gaoming20@hihonor.com>
>     exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
> 
> Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>     firmware: arm_scmi: Drop OF node reference in the transport channel setup
> 
> Xiubo Li <xiubli@redhat.com>
>     ceph: defer stopping mdsc delayed_work
> 
> Ross Maynard <bids.7405@bigpond.com>
>     USB: zaurus: Add ID for A-300/B-500/C-700
> 
> Ilya Dryomov <idryomov@gmail.com>
>     libceph: fix potential hang in ceph_osdc_notify()
> 
> Michael Kelley <mikelley@microsoft.com>
>     scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
> 
> Steffen Maier <maier@linux.ibm.com>
>     scsi: zfcp: Defer fc_rport blocking until after ADISC response
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: annotate data-races around tm->tcpm_net
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: annotate data-races around tm->tcpm_vals[]
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: annotate data-races around tm->tcpm_lock
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: annotate data-races around tm->tcpm_stamp
> 
> Eric Dumazet <edumazet@google.com>
>     tcp_metrics: fix addr_same() helper
> 
> Jonas Gorski <jonas.gorski@bisdn.de>
>     prestera: fix fallback to previous version on same major version
> 
> Jianbo Liu <jianbol@nvidia.com>
>     net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
> 
> Jianbo Liu <jianbol@nvidia.com>
>     net/mlx5: fs_core: Make find_closest_ft more generic
> 
> Benjamin Poirier <bpoirier@nvidia.com>
>     vxlan: Fix nexthop hash size
> 
> Yue Haibing <yuehaibing@huawei.com>
>     ip6mr: Fix skb_under_panic in ip6mr_cache_report()
> 
> Alexandra Winter <wintera@linux.ibm.com>
>     s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
> 
> Lin Ma <linma@zju.edu.cn>
>     net: dcb: choose correct policy to parse DCB_ATTR_BCN
> 
> Mark Brown <broonie@kernel.org>
>     net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
> 
> Yuanjun Gong <ruc_gongyuanjun@163.com>
>     net: korina: handle clk prepare error in korina_probe()
> 
> Dan Carpenter <dan.carpenter@linaro.org>
>     net: ll_temac: fix error checking of irq_of_parse_and_map()
> 
> Yang Yingliang <yangyingliang@huawei.com>
>     net: ll_temac: Switch to use dev_err_probe() helper
> 
> Tomas Glozar <tglozar@redhat.com>
>     bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
> 
> valis <sec@valis.email>
>     net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
> 
> valis <sec@valis.email>
>     net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
> 
> valis <sec@valis.email>
>     net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
> 
> Hou Tao <houtao1@huawei.com>
>     bpf, cpumap: Handle skb as well when clean up ptr_ring
> 
> Kuniyuki Iwashima <kuniyu@amazon.com>
>     net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
> 
> Eric Dumazet <edumazet@google.com>
>     net: add missing data-race annotation for sk_ll_usec
> 
> Eric Dumazet <edumazet@google.com>
>     net: add missing data-race annotations around sk->sk_peek_off
> 
> Eric Dumazet <edumazet@google.com>
>     net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
> 
> Eric Dumazet <edumazet@google.com>
>     net: add missing READ_ONCE(sk->sk_sndbuf) annotation
> 
> Eric Dumazet <edumazet@google.com>
>     net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
> 
> Eric Dumazet <edumazet@google.com>
>     net: annotate data-races around sk->sk_max_pacing_rate
> 
> Konstantin Khorenko <khorenko@virtuozzo.com>
>     qed: Fix scheduling in a tasklet while getting stats
> 
> Prabhakar Kushwaha <pkushwaha@marvell.com>
>     qed: Fix kernel-doc warnings
> 
> Chengfeng Ye <dg573847474@gmail.com>
>     mISDN: hfcpci: Fix potential deadlock on &hc->lock
> 
> Jamal Hadi Salim <jhs@mojatatu.com>
>     net: sched: cls_u32: Fix match key mis-addressing
> 
> Georg Müller <georgmueller@gmx.net>
>     perf test uprobe_from_different_cu: Skip if there is no gcc
> 
> Yuanjun Gong <ruc_gongyuanjun@163.com>
>     net: dsa: fix value check in bcm_sf2_sw_probe()
> 
> Lin Ma <linma@zju.edu.cn>
>     rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
> 
> Lin Ma <linma@zju.edu.cn>
>     bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
> 
> Yuanjun Gong <ruc_gongyuanjun@163.com>
>     net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
> 
> Zhengchao Shao <shaozhengchao@huawei.com>
>     net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
> 
> Ilan Peer <ilan.peer@intel.com>
>     wifi: cfg80211: Fix return value in scan logic
> 
> Heiko Carstens <hca@linux.ibm.com>
>     KVM: s390: fix sthyi error handling
> 
> ndesaulniers@google.com <ndesaulniers@google.com>
>     word-at-a-time: use the same return type for has_zero regardless of endianness
> 
> Cristian Marussi <cristian.marussi@arm.com>
>     firmware: arm_scmi: Fix chan_free cleanup on SMC
> 
> Hugo Villeneuve <hvilleneuve@dimonoff.com>
>     arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
> 
> Robin Murphy <robin.murphy@arm.com>
>     iommu/arm-smmu-v3: Document nesting-related errata
> 
> Robin Murphy <robin.murphy@arm.com>
>     iommu/arm-smmu-v3: Add explicit feature for nesting
> 
> Robin Murphy <robin.murphy@arm.com>
>     iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
> 
> Robin Murphy <robin.murphy@arm.com>
>     iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
> 
> Suzuki K Poulose <suzuki.poulose@arm.com>
>     arm64: errata: Add detection for TRBE write to out-of-range
> 
> Suzuki K Poulose <suzuki.poulose@arm.com>
>     arm64: errata: Add workaround for TSB flush failures
> 
> Shay Drory <shayd@nvidia.com>
>     net/mlx5: Free irqs only on shutdown callback
> 
> Peter Zijlstra <peterz@infradead.org>
>     perf: Fix function pointer case
> 
> Jens Axboe <axboe@kernel.dk>
>     io_uring: gate iowait schedule on having pending requests
> 
> 
> -------------
> 
> Diffstat:
> 
>  Documentation/arm64/silicon-errata.rst             |  12 +
>  Makefile                                           |   4 +-
>  arch/arm64/Kconfig                                 |  74 ++
>  .../boot/dts/altera/socfpga_stratix10_socdk.dts    |   2 +-
>  .../dts/altera/socfpga_stratix10_socdk_nand.dts    |   2 +-
>  arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi  |   2 +-
>  arch/arm64/include/asm/barrier.h                   |  16 +-
>  arch/arm64/kernel/cpu_errata.c                     |  39 +
>  arch/arm64/tools/cpucaps                           |   2 +
>  arch/powerpc/include/asm/word-at-a-time.h          |   2 +-
>  arch/powerpc/mm/init_64.c                          |   3 +-
>  arch/s390/kernel/sthyi.c                           |   6 +-
>  arch/s390/kvm/intercept.c                          |   9 +-
>  drivers/base/power/power.h                         |   8 +-
>  drivers/base/power/runtime.c                       |   6 +-
>  drivers/base/power/wakeirq.c                       | 111 ++-
>  drivers/block/rbd.c                                |  28 +-
>  drivers/firmware/arm_scmi/mailbox.c                |   4 +-
>  drivers/firmware/arm_scmi/smc.c                    |  21 +-
>  drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c        |   8 +-
>  drivers/gpu/drm/imx/ipuv3-crtc.c                   |   2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c                       |   3 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c        |  50 ++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h        |   8 +
>  drivers/isdn/hardware/mISDN/hfcpci.c               |  10 +-
>  drivers/mtd/nand/raw/fsl_upm.c                     |   2 +-
>  drivers/mtd/nand/raw/meson_nand.c                  |   3 +-
>  drivers/mtd/nand/raw/omap_elm.c                    |  24 +-
>  drivers/mtd/nand/raw/rockchip-nand-controller.c    |  45 +-
>  drivers/mtd/nand/spi/toshiba.c                     |   4 +-
>  drivers/net/dsa/bcm_sf2.c                          |   8 +-
>  drivers/net/ethernet/korina.c                      |   3 +-
>  .../net/ethernet/marvell/prestera/prestera_pci.c   |   3 +-
>  .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c       |   4 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 105 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h |   1 +
>  drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  29 +
>  .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c  |   5 +-
>  drivers/net/ethernet/qlogic/qed/qed.h              |   9 +-
>  drivers/net/ethernet/qlogic/qed/qed_cxt.h          | 138 +--
>  drivers/net/ethernet/qlogic/qed/qed_dev_api.h      | 361 ++++----
>  drivers/net/ethernet/qlogic/qed/qed_fcoe.c         |  19 +-
>  drivers/net/ethernet/qlogic/qed/qed_fcoe.h         |  17 +-
>  drivers/net/ethernet/qlogic/qed/qed_hsi.h          | 922 +++++++++++----------
>  drivers/net/ethernet/qlogic/qed/qed_hw.c           |  26 +-
>  drivers/net/ethernet/qlogic/qed/qed_hw.h           | 214 ++---
>  drivers/net/ethernet/qlogic/qed/qed_init_ops.h     |  58 +-
>  drivers/net/ethernet/qlogic/qed/qed_int.h          | 274 +++---
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.c        |  19 +-
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.h        |  17 +-
>  drivers/net/ethernet/qlogic/qed/qed_l2.c           |  19 +-
>  drivers/net/ethernet/qlogic/qed/qed_l2.h           | 158 ++--
>  drivers/net/ethernet/qlogic/qed/qed_ll2.h          | 130 +--
>  drivers/net/ethernet/qlogic/qed/qed_main.c         |   6 +-
>  drivers/net/ethernet/qlogic/qed/qed_mcp.h          | 757 +++++++++--------
>  drivers/net/ethernet/qlogic/qed/qed_selftest.h     |  30 +-
>  drivers/net/ethernet/qlogic/qed/qed_sp.h           | 215 +++--
>  drivers/net/ethernet/qlogic/qed/qed_sriov.h        |  99 ++-
>  drivers/net/ethernet/qlogic/qed/qed_vf.h           | 301 ++++---
>  drivers/net/ethernet/qlogic/qede/qede_main.c       |   5 +-
>  drivers/net/ethernet/socionext/netsec.c            |  11 +
>  drivers/net/ethernet/xilinx/ll_temac_main.c        |  16 +-
>  drivers/net/tap.c                                  |   2 +-
>  drivers/net/tun.c                                  |   2 +-
>  drivers/net/usb/cdc_ether.c                        |  21 +
>  drivers/net/usb/usbnet.c                           |   6 +
>  drivers/net/usb/zaurus.c                           |  21 +
>  drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c |   6 +-
>  drivers/s390/net/qeth_core.h                       |   1 -
>  drivers/s390/net/qeth_core_main.c                  |   2 -
>  drivers/s390/net/qeth_l2_main.c                    |   9 +-
>  drivers/s390/net/qeth_l3_main.c                    |   8 +-
>  drivers/s390/scsi/zfcp_fc.c                        |   6 +-
>  drivers/scsi/storvsc_drv.c                         |   4 +
>  drivers/soundwire/bus.c                            |  20 +-
>  fs/ceph/mds_client.c                               |   4 +-
>  fs/ceph/mds_client.h                               |   5 +
>  fs/ceph/super.c                                    |  10 +
>  fs/exfat/balloc.c                                  |   6 +-
>  fs/exfat/dir.c                                     |  27 +-
>  fs/ext2/ext2.h                                     |  12 -
>  fs/ext2/super.c                                    |  23 +-
>  fs/file.c                                          |  18 +-
>  fs/ntfs3/attrlist.c                                |   4 +-
>  fs/open.c                                          |   2 +-
>  fs/super.c                                         |  11 +-
>  fs/sysv/itree.c                                    |   4 +
>  include/asm-generic/word-at-a-time.h               |   2 +-
>  include/linux/pm_wakeirq.h                         |   9 +-
>  include/linux/qed/qed_chain.h                      |  97 ++-
>  include/linux/qed/qed_if.h                         | 255 +++---
>  include/linux/qed/qed_iscsi_if.h                   |   2 +-
>  include/linux/qed/qed_ll2_if.h                     |  42 +-
>  include/linux/qed/qed_nvmetcp_if.h                 |  17 +
>  include/net/vxlan.h                                |   4 +-
>  io_uring/io_uring.c                                |  23 +-
>  kernel/bpf/cpumap.c                                |  35 +-
>  kernel/events/core.c                               |   8 +-
>  kernel/trace/bpf_trace.c                           |   6 +-
>  net/bluetooth/l2cap_sock.c                         |   2 +
>  net/ceph/osd_client.c                              |  20 +-
>  net/core/bpf_sk_storage.c                          |   5 +-
>  net/core/rtnetlink.c                               |   8 +-
>  net/core/sock.c                                    |  21 +-
>  net/core/sock_map.c                                |   2 -
>  net/dcb/dcbnl.c                                    |   2 +-
>  net/ipv4/tcp_metrics.c                             |  70 +-
>  net/ipv6/ip6mr.c                                   |   2 +-
>  net/sched/cls_fw.c                                 |   1 -
>  net/sched/cls_route.c                              |   1 -
>  net/sched/cls_u32.c                                |  57 +-
>  net/sched/sch_taprio.c                             |  15 +-
>  net/unix/af_unix.c                                 |   2 +-
>  net/wireless/scan.c                                |   2 +-
>  .../tests/shell/test_uprobe_from_different_cu.sh   |   8 +-
>  tools/testing/selftests/rseq/rseq.c                |  31 +-
>  117 files changed, 3227 insertions(+), 2247 deletions(-)
> 
> 
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/9/23 06:53, Joel Fernandes wrote:
> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
>> This is the start of the stable review cycle for the 5.15.126 release.
>> There are 92 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>> or in the git tree and branch at:
>> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>> and the diffstat can be found below.
> 
> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> hang with this -rc: TREE04, TREE07, TASKS03.
> 
> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> issue does not show up on anything but 5.15 stable kernels and neither on
> mainline.
> 

Do you by any have a crash pattern that we could possibly use to find the crash
in ChromeOS crash logs ? No idea if that would help, but it could provide some
additional data points.

Thanks,
Guenter

> I will do some more runs to see if TASKS03 hang is a new thing but it could
> be related to the existing issues.
> 
> thanks,
> 
>   - Joel
> 
> 
> 
>>
>> thanks,
>>
>> greg k-h
>>
>> -------------
>> Pseudo-Shortlog of commits:
>>
>> Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>      Linux 5.15.126-rc1
>>
>> Johan Hovold <johan+linaro@kernel.org>
>>      PM: sleep: wakeirq: fix wake irq arming
>>
>> Chunfeng Yun <chunfeng.yun@mediatek.com>
>>      PM / wakeirq: support enabling wake-up irq after runtime_suspend called
>>
>> Johan Hovold <johan+linaro@kernel.org>
>>      soundwire: fix enumeration completion
>>
>> Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>>      soundwire: bus: pm_runtime_request_resume on peripheral attachment
>>
>> Sean Christopherson <seanjc@google.com>
>>      selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
>>
>> Michael Jeanson <mjeanson@efficios.com>
>>      selftests/rseq: check if libc rseq support is registered
>>
>> Alexander Stein <alexander.stein@ew.tq-group.com>
>>      drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
>>
>> Thomas Zimmermann <tzimmermann@suse.de>
>>      drm/fsl-dcu: Use drm_plane_helper_destroy()
>>
>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>      powerpc/mm/altmap: Fix altmap boundary check
>>
>> Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>>      mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
>>
>> Johan Jonker <jbx6244@gmail.com>
>>      mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
>>
>> Johan Jonker <jbx6244@gmail.com>
>>      mtd: rawnand: rockchip: fix oobfree offset and description
>>
>> Roger Quadros <rogerq@kernel.org>
>>      mtd: rawnand: omap_elm: Fix incorrect type in assignment
>>
>> Jan Kara <jack@suse.cz>
>>      ext2: Drop fragment support
>>
>> Jan Kara <jack@suse.cz>
>>      fs: Protect reconfiguration of sb read-write from racing writes
>>
>> Alan Stern <stern@rowland.harvard.edu>
>>      net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
>>
>> Sungwoo Kim <iam@sung-woo.kim>
>>      Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
>>
>> Prince Kumar Maurya <princekumarmaurya06@gmail.com>
>>      fs/sysv: Null check to prevent null-ptr-deref bug
>>
>> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>      fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
>>
>> Linus Torvalds <torvalds@linux-foundation.org>
>>      file: reinstate f_pos locking optimization for regular files
>>
>> Hou Tao <houtao1@huawei.com>
>>      bpf, cpumap: Make sure kthread is running before map update returns
>>
>> Guchun Chen <guchun.chen@amd.com>
>>      drm/ttm: check null pointer before accessing when swapping
>>
>> Aleksa Sarai <cyphar@cyphar.com>
>>      open: make RESOLVE_CACHED correctly test for O_TMPFILE
>>
>> Jiri Olsa <jolsa@kernel.org>
>>      bpf: Disable preemption in bpf_event_output
>>
>> Ilya Dryomov <idryomov@gmail.com>
>>      rbd: prevent busy loop when requesting exclusive lock
>>
>> Paul Fertser <fercerpav@gmail.com>
>>      wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
>>
>> Laszlo Ersek <lersek@redhat.com>
>>      net: tap_open(): set sk_uid from current_fsuid()
>>
>> Laszlo Ersek <lersek@redhat.com>
>>      net: tun_chr_open(): set sk_uid from current_fsuid()
>>
>> Dinh Nguyen <dinguyen@kernel.org>
>>      arm64: dts: stratix10: fix incorrect I2C property for SCL signal
>>
>> Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>      mtd: rawnand: meson: fix OOB available bytes for ECC
>>
>> Olivier Maignial <olivier.maignial@hotmail.fr>
>>      mtd: spinand: toshiba: Fix ecc_get_status
>>
>> Sungjong Seo <sj1557.seo@samsung.com>
>>      exfat: release s_lock before calling dir_emit()
>>
>> gaoming <gaoming20@hihonor.com>
>>      exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
>>
>> Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>      firmware: arm_scmi: Drop OF node reference in the transport channel setup
>>
>> Xiubo Li <xiubli@redhat.com>
>>      ceph: defer stopping mdsc delayed_work
>>
>> Ross Maynard <bids.7405@bigpond.com>
>>      USB: zaurus: Add ID for A-300/B-500/C-700
>>
>> Ilya Dryomov <idryomov@gmail.com>
>>      libceph: fix potential hang in ceph_osdc_notify()
>>
>> Michael Kelley <mikelley@microsoft.com>
>>      scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
>>
>> Steffen Maier <maier@linux.ibm.com>
>>      scsi: zfcp: Defer fc_rport blocking until after ADISC response
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: annotate data-races around tm->tcpm_net
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: annotate data-races around tm->tcpm_vals[]
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: annotate data-races around tm->tcpm_lock
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: annotate data-races around tm->tcpm_stamp
>>
>> Eric Dumazet <edumazet@google.com>
>>      tcp_metrics: fix addr_same() helper
>>
>> Jonas Gorski <jonas.gorski@bisdn.de>
>>      prestera: fix fallback to previous version on same major version
>>
>> Jianbo Liu <jianbol@nvidia.com>
>>      net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
>>
>> Jianbo Liu <jianbol@nvidia.com>
>>      net/mlx5: fs_core: Make find_closest_ft more generic
>>
>> Benjamin Poirier <bpoirier@nvidia.com>
>>      vxlan: Fix nexthop hash size
>>
>> Yue Haibing <yuehaibing@huawei.com>
>>      ip6mr: Fix skb_under_panic in ip6mr_cache_report()
>>
>> Alexandra Winter <wintera@linux.ibm.com>
>>      s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
>>
>> Lin Ma <linma@zju.edu.cn>
>>      net: dcb: choose correct policy to parse DCB_ATTR_BCN
>>
>> Mark Brown <broonie@kernel.org>
>>      net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
>>
>> Yuanjun Gong <ruc_gongyuanjun@163.com>
>>      net: korina: handle clk prepare error in korina_probe()
>>
>> Dan Carpenter <dan.carpenter@linaro.org>
>>      net: ll_temac: fix error checking of irq_of_parse_and_map()
>>
>> Yang Yingliang <yangyingliang@huawei.com>
>>      net: ll_temac: Switch to use dev_err_probe() helper
>>
>> Tomas Glozar <tglozar@redhat.com>
>>      bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
>>
>> valis <sec@valis.email>
>>      net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
>>
>> valis <sec@valis.email>
>>      net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
>>
>> valis <sec@valis.email>
>>      net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
>>
>> Hou Tao <houtao1@huawei.com>
>>      bpf, cpumap: Handle skb as well when clean up ptr_ring
>>
>> Kuniyuki Iwashima <kuniyu@amazon.com>
>>      net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: add missing data-race annotation for sk_ll_usec
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: add missing data-race annotations around sk->sk_peek_off
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: add missing READ_ONCE(sk->sk_sndbuf) annotation
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
>>
>> Eric Dumazet <edumazet@google.com>
>>      net: annotate data-races around sk->sk_max_pacing_rate
>>
>> Konstantin Khorenko <khorenko@virtuozzo.com>
>>      qed: Fix scheduling in a tasklet while getting stats
>>
>> Prabhakar Kushwaha <pkushwaha@marvell.com>
>>      qed: Fix kernel-doc warnings
>>
>> Chengfeng Ye <dg573847474@gmail.com>
>>      mISDN: hfcpci: Fix potential deadlock on &hc->lock
>>
>> Jamal Hadi Salim <jhs@mojatatu.com>
>>      net: sched: cls_u32: Fix match key mis-addressing
>>
>> Georg Müller <georgmueller@gmx.net>
>>      perf test uprobe_from_different_cu: Skip if there is no gcc
>>
>> Yuanjun Gong <ruc_gongyuanjun@163.com>
>>      net: dsa: fix value check in bcm_sf2_sw_probe()
>>
>> Lin Ma <linma@zju.edu.cn>
>>      rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
>>
>> Lin Ma <linma@zju.edu.cn>
>>      bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
>>
>> Yuanjun Gong <ruc_gongyuanjun@163.com>
>>      net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
>>
>> Zhengchao Shao <shaozhengchao@huawei.com>
>>      net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
>>
>> Ilan Peer <ilan.peer@intel.com>
>>      wifi: cfg80211: Fix return value in scan logic
>>
>> Heiko Carstens <hca@linux.ibm.com>
>>      KVM: s390: fix sthyi error handling
>>
>> ndesaulniers@google.com <ndesaulniers@google.com>
>>      word-at-a-time: use the same return type for has_zero regardless of endianness
>>
>> Cristian Marussi <cristian.marussi@arm.com>
>>      firmware: arm_scmi: Fix chan_free cleanup on SMC
>>
>> Hugo Villeneuve <hvilleneuve@dimonoff.com>
>>      arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
>>
>> Robin Murphy <robin.murphy@arm.com>
>>      iommu/arm-smmu-v3: Document nesting-related errata
>>
>> Robin Murphy <robin.murphy@arm.com>
>>      iommu/arm-smmu-v3: Add explicit feature for nesting
>>
>> Robin Murphy <robin.murphy@arm.com>
>>      iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
>>
>> Robin Murphy <robin.murphy@arm.com>
>>      iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
>>
>> Suzuki K Poulose <suzuki.poulose@arm.com>
>>      arm64: errata: Add detection for TRBE write to out-of-range
>>
>> Suzuki K Poulose <suzuki.poulose@arm.com>
>>      arm64: errata: Add workaround for TSB flush failures
>>
>> Shay Drory <shayd@nvidia.com>
>>      net/mlx5: Free irqs only on shutdown callback
>>
>> Peter Zijlstra <peterz@infradead.org>
>>      perf: Fix function pointer case
>>
>> Jens Axboe <axboe@kernel.dk>
>>      io_uring: gate iowait schedule on having pending requests
>>
>>
>> -------------
>>
>> Diffstat:
>>
>>   Documentation/arm64/silicon-errata.rst             |  12 +
>>   Makefile                                           |   4 +-
>>   arch/arm64/Kconfig                                 |  74 ++
>>   .../boot/dts/altera/socfpga_stratix10_socdk.dts    |   2 +-
>>   .../dts/altera/socfpga_stratix10_socdk_nand.dts    |   2 +-
>>   arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi  |   2 +-
>>   arch/arm64/include/asm/barrier.h                   |  16 +-
>>   arch/arm64/kernel/cpu_errata.c                     |  39 +
>>   arch/arm64/tools/cpucaps                           |   2 +
>>   arch/powerpc/include/asm/word-at-a-time.h          |   2 +-
>>   arch/powerpc/mm/init_64.c                          |   3 +-
>>   arch/s390/kernel/sthyi.c                           |   6 +-
>>   arch/s390/kvm/intercept.c                          |   9 +-
>>   drivers/base/power/power.h                         |   8 +-
>>   drivers/base/power/runtime.c                       |   6 +-
>>   drivers/base/power/wakeirq.c                       | 111 ++-
>>   drivers/block/rbd.c                                |  28 +-
>>   drivers/firmware/arm_scmi/mailbox.c                |   4 +-
>>   drivers/firmware/arm_scmi/smc.c                    |  21 +-
>>   drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_plane.c        |   8 +-
>>   drivers/gpu/drm/imx/ipuv3-crtc.c                   |   2 +-
>>   drivers/gpu/drm/ttm/ttm_bo.c                       |   3 +-
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c        |  50 ++
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h        |   8 +
>>   drivers/isdn/hardware/mISDN/hfcpci.c               |  10 +-
>>   drivers/mtd/nand/raw/fsl_upm.c                     |   2 +-
>>   drivers/mtd/nand/raw/meson_nand.c                  |   3 +-
>>   drivers/mtd/nand/raw/omap_elm.c                    |  24 +-
>>   drivers/mtd/nand/raw/rockchip-nand-controller.c    |  45 +-
>>   drivers/mtd/nand/spi/toshiba.c                     |   4 +-
>>   drivers/net/dsa/bcm_sf2.c                          |   8 +-
>>   drivers/net/ethernet/korina.c                      |   3 +-
>>   .../net/ethernet/marvell/prestera/prestera_pci.c   |   3 +-
>>   .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c       |   4 +-
>>   drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
>>   drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 105 ++-
>>   drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h |   1 +
>>   drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  29 +
>>   .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c  |   5 +-
>>   drivers/net/ethernet/qlogic/qed/qed.h              |   9 +-
>>   drivers/net/ethernet/qlogic/qed/qed_cxt.h          | 138 +--
>>   drivers/net/ethernet/qlogic/qed/qed_dev_api.h      | 361 ++++----
>>   drivers/net/ethernet/qlogic/qed/qed_fcoe.c         |  19 +-
>>   drivers/net/ethernet/qlogic/qed/qed_fcoe.h         |  17 +-
>>   drivers/net/ethernet/qlogic/qed/qed_hsi.h          | 922 +++++++++++----------
>>   drivers/net/ethernet/qlogic/qed/qed_hw.c           |  26 +-
>>   drivers/net/ethernet/qlogic/qed/qed_hw.h           | 214 ++---
>>   drivers/net/ethernet/qlogic/qed/qed_init_ops.h     |  58 +-
>>   drivers/net/ethernet/qlogic/qed/qed_int.h          | 274 +++---
>>   drivers/net/ethernet/qlogic/qed/qed_iscsi.c        |  19 +-
>>   drivers/net/ethernet/qlogic/qed/qed_iscsi.h        |  17 +-
>>   drivers/net/ethernet/qlogic/qed/qed_l2.c           |  19 +-
>>   drivers/net/ethernet/qlogic/qed/qed_l2.h           | 158 ++--
>>   drivers/net/ethernet/qlogic/qed/qed_ll2.h          | 130 +--
>>   drivers/net/ethernet/qlogic/qed/qed_main.c         |   6 +-
>>   drivers/net/ethernet/qlogic/qed/qed_mcp.h          | 757 +++++++++--------
>>   drivers/net/ethernet/qlogic/qed/qed_selftest.h     |  30 +-
>>   drivers/net/ethernet/qlogic/qed/qed_sp.h           | 215 +++--
>>   drivers/net/ethernet/qlogic/qed/qed_sriov.h        |  99 ++-
>>   drivers/net/ethernet/qlogic/qed/qed_vf.h           | 301 ++++---
>>   drivers/net/ethernet/qlogic/qede/qede_main.c       |   5 +-
>>   drivers/net/ethernet/socionext/netsec.c            |  11 +
>>   drivers/net/ethernet/xilinx/ll_temac_main.c        |  16 +-
>>   drivers/net/tap.c                                  |   2 +-
>>   drivers/net/tun.c                                  |   2 +-
>>   drivers/net/usb/cdc_ether.c                        |  21 +
>>   drivers/net/usb/usbnet.c                           |   6 +
>>   drivers/net/usb/zaurus.c                           |  21 +
>>   drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c |   6 +-
>>   drivers/s390/net/qeth_core.h                       |   1 -
>>   drivers/s390/net/qeth_core_main.c                  |   2 -
>>   drivers/s390/net/qeth_l2_main.c                    |   9 +-
>>   drivers/s390/net/qeth_l3_main.c                    |   8 +-
>>   drivers/s390/scsi/zfcp_fc.c                        |   6 +-
>>   drivers/scsi/storvsc_drv.c                         |   4 +
>>   drivers/soundwire/bus.c                            |  20 +-
>>   fs/ceph/mds_client.c                               |   4 +-
>>   fs/ceph/mds_client.h                               |   5 +
>>   fs/ceph/super.c                                    |  10 +
>>   fs/exfat/balloc.c                                  |   6 +-
>>   fs/exfat/dir.c                                     |  27 +-
>>   fs/ext2/ext2.h                                     |  12 -
>>   fs/ext2/super.c                                    |  23 +-
>>   fs/file.c                                          |  18 +-
>>   fs/ntfs3/attrlist.c                                |   4 +-
>>   fs/open.c                                          |   2 +-
>>   fs/super.c                                         |  11 +-
>>   fs/sysv/itree.c                                    |   4 +
>>   include/asm-generic/word-at-a-time.h               |   2 +-
>>   include/linux/pm_wakeirq.h                         |   9 +-
>>   include/linux/qed/qed_chain.h                      |  97 ++-
>>   include/linux/qed/qed_if.h                         | 255 +++---
>>   include/linux/qed/qed_iscsi_if.h                   |   2 +-
>>   include/linux/qed/qed_ll2_if.h                     |  42 +-
>>   include/linux/qed/qed_nvmetcp_if.h                 |  17 +
>>   include/net/vxlan.h                                |   4 +-
>>   io_uring/io_uring.c                                |  23 +-
>>   kernel/bpf/cpumap.c                                |  35 +-
>>   kernel/events/core.c                               |   8 +-
>>   kernel/trace/bpf_trace.c                           |   6 +-
>>   net/bluetooth/l2cap_sock.c                         |   2 +
>>   net/ceph/osd_client.c                              |  20 +-
>>   net/core/bpf_sk_storage.c                          |   5 +-
>>   net/core/rtnetlink.c                               |   8 +-
>>   net/core/sock.c                                    |  21 +-
>>   net/core/sock_map.c                                |   2 -
>>   net/dcb/dcbnl.c                                    |   2 +-
>>   net/ipv4/tcp_metrics.c                             |  70 +-
>>   net/ipv6/ip6mr.c                                   |   2 +-
>>   net/sched/cls_fw.c                                 |   1 -
>>   net/sched/cls_route.c                              |   1 -
>>   net/sched/cls_u32.c                                |  57 +-
>>   net/sched/sch_taprio.c                             |  15 +-
>>   net/unix/af_unix.c                                 |   2 +-
>>   net/wireless/scan.c                                |   2 +-
>>   .../tests/shell/test_uprobe_from_different_cu.sh   |   8 +-
>>   tools/testing/selftests/rseq/rseq.c                |  31 +-
>>   117 files changed, 3227 insertions(+), 2247 deletions(-)
>>
>>

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 8/9/23 06:53, Joel Fernandes wrote:
> > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> >> This is the start of the stable review cycle for the 5.15.126 release.
> >> There are 92 patches in this series, all will be posted as a response
> >> to this one.  If anyone has any issues with these being applied, please
> >> let me know.
> >>
> >> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> >> Anything received after that time might be too late.
> >>
> >> The whole patch series can be found in one patch at:
> >>      https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> >> or in the git tree and branch at:
> >>      git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> >> and the diffstat can be found below.
> >
> > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > hang with this -rc: TREE04, TREE07, TASKS03.
> >
> > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > issue does not show up on anything but 5.15 stable kernels and neither on
> > mainline.
> >
>
> Do you by any have a crash pattern that we could possibly use to find the crash
> in ChromeOS crash logs ? No idea if that would help, but it could provide some
> additional data points.

The pattern shows as a hard hang, the system is unresponsive and all CPUs
are stuck in stop_machine. Sometimes it recovers on its own from the
hang and then RCU immediately gives stall warnings. It takes 1.5 hour
to reproduce and sometimes never happens for several hours.

It appears related to CPU hotplug since gdb showed me most of the CPUs
are spinning in multi_cpu_stop() / stop machine after the hang.

thanks,

 - Joel
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 8/9/23 06:53, Joel Fernandes wrote:
> > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > >> This is the start of the stable review cycle for the 5.15.126 release.
> > >> There are 92 patches in this series, all will be posted as a response
> > >> to this one.  If anyone has any issues with these being applied, please
> > >> let me know.
> > >>
> > >> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > >> Anything received after that time might be too late.
> > >>
> > >> The whole patch series can be found in one patch at:
> > >>      https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > >> or in the git tree and branch at:
> > >>      git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > >> and the diffstat can be found below.
> > >
> > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > hang with this -rc: TREE04, TREE07, TASKS03.
> > >
> > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > mainline.
> > >
> >
> > Do you by any have a crash pattern that we could possibly use to find the crash
> > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > additional data points.
> 
> The pattern shows as a hard hang, the system is unresponsive and all CPUs
> are stuck in stop_machine. Sometimes it recovers on its own from the
> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> to reproduce and sometimes never happens for several hours.
> 
> It appears related to CPU hotplug since gdb showed me most of the CPUs
> are spinning in multi_cpu_stop() / stop machine after the hang.
> 

Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
Example:

<0>[63298.624328] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [migration/0:11]
<4>[63298.624331] Modules linked in: 8021q ccm snd_seq_dummy snd_seq snd_seq_device bridge stp llc tun nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 ip6t_REJECT ip6t_ipv6header vhost_vsock vhost vmw_vsock_virtio_transport_common vsock veth rfcomm xt_cgroup cmac algif_hash algif_skcipher af_alg xt_MASQUERADE uinput iwlmvm snd_soc_skl_ssp_clk iwl7000_mac80211 btusb snd_soc_kbl_da7219_max98357a btrtl btintel snd_soc_hdac_hdmi btbcm bluetooth snd_soc_dmic snd_soc_skl ecdh_generic ecc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_hdac_hda uvcvideo snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core videobuf2_vmalloc videobuf2_v4l2 videobuf2_common snd_intel_dspcfg videobuf2_memops snd_hda_codec snd_hwdep snd_hda_core iwlwifi snd_soc_da7219 snd_soc_max98357a fuse ip6table_nat cfg80211 lzo_rle lzo_compress zram joydev
<4>[63298.624357] CPU: 0 PID: 11 Comm: migration/0 Tainted: G     U  W         5.4.180-17902-g44152654f29b #1
<4>[63298.624358] Hardware name: Google Nami/Nami, BIOS Google_Nami.10775.145.0 09/19/2019
<4>[63298.624363] RIP: 0010:stop_machine_yield+0xb/0xd
<4>[63298.624366] Code: ff 74 b6 f0 ff 0f 75 b1 48 83 c7 08 e8 1f cb f9 ff eb a6 e8 a0 20 e3 ff eb bc e8 50 4b f5 ff 0f 1f 44 00 00 55 48 89 e5 f3 90 <5d> c3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 81
<4>[63298.624368] RSP: 0000:ffffbaf90006fe38 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
<4>[63298.624370] RAX: 0000000000000000 RBX: ffffbaf90300bca8 RCX: 0000000000000000
<4>[63298.624371] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffb0d46920
<4>[63298.624373] RBP: ffffbaf90006fe38 R08: 0000000000000002 R09: 0000398ecf9a0ac5
<4>[63298.624374] R10: 0000000000000171 R11: ffffffffaf9cfb11 R12: 0000000000000001
<4>[63298.624376] R13: ffff9b09baa22201 R14: ffffffffb0d46920 R15: 0000000000000001
<4>[63298.624377] FS:  0000000000000000(0000) GS:ffff9b09baa00000(0000) knlGS:0000000000000000
<4>[63298.624379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[63298.624380] CR2: 0000153c00724820 CR3: 0000000171ab8005 CR4: 00000000003606f0
<4>[63298.624382] Call Trace:
<4>[63298.624386]  multi_cpu_stop+0x89/0x119
<4>[63298.624389]  ? stop_two_cpus+0x24d/0x24d
<4>[63298.624391]  cpu_stopper_thread+0x8f/0x111
<4>[63298.624394]  smpboot_thread_fn+0x174/0x212
<4>[63298.624397]  kthread+0x147/0x156
<4>[63298.624399]  ? cpu_report_death+0x43/0x43
<4>[63298.624401]  ? kthread_blkcg+0x2e/0x2e
<4>[63298.624404]  ret_from_fork+0x35/0x40
<0>[63298.624407] Kernel panic - not syncing: softlockup: hung tasks

I guess that is something different ?

Guenter
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > On 8/9/23 06:53, Joel Fernandes wrote:
> > > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > > >> This is the start of the stable review cycle for the 5.15.126 release.
> > > >> There are 92 patches in this series, all will be posted as a response
> > > >> to this one.  If anyone has any issues with these being applied, please
> > > >> let me know.
> > > >>
> > > >> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > >> Anything received after that time might be too late.
> > > >>
> > > >> The whole patch series can be found in one patch at:
> > > >>      https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > > >> or in the git tree and branch at:
> > > >>      git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > >> and the diffstat can be found below.
> > > >
> > > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > > hang with this -rc: TREE04, TREE07, TASKS03.
> > > >
> > > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > > mainline.
> > > >
> > >
> > > Do you by any have a crash pattern that we could possibly use to find the crash
> > > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > > additional data points.
> > 
> > The pattern shows as a hard hang, the system is unresponsive and all CPUs
> > are stuck in stop_machine. Sometimes it recovers on its own from the
> > hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> > to reproduce and sometimes never happens for several hours.
> > 
> > It appears related to CPU hotplug since gdb showed me most of the CPUs
> > are spinning in multi_cpu_stop() / stop machine after the hang.
> > 
> 
> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().

Interesting. It looks similar as far as the stack dump in gdb goes, here are
the stacks I dumped with the hang I referred to:
https://paste.debian.net/1288308/

But in dmesg, it prints nothing for about 20-30 mins before recovering, then
I get RCU stalls. It looks like this:

[  682.721962] kvm-clock: cpu 7, msr 199981c1, secondary cpu clock
[  682.736830] kvm-guest: stealtime: cpu 7, msr 1f5db140
[  684.445875] smpboot: Booting Node 0 Processor 5 APIC 0x5
[  684.467831] kvm-clock: cpu 5, msr 19998141, secondary cpu clock
[  684.555766] kvm-guest: stealtime: cpu 5, msr 1f55b140
[  687.356637] smpboot: Booting Node 0 Processor 4 APIC 0x4
[  687.377214] kvm-clock: cpu 4, msr 19998101, secondary cpu clock
[ 2885.473742] kvm-guest: stealtime: cpu 4, msr 1f51b140
[ 2886.456408] rcu: INFO: rcu_sched self-detected stall on CPU
[ 2886.457590] rcu_torture_fwd_prog_nr: Duration 15423 cver 170 gps 337
[ 2886.464934] rcu: 0-...!: (2 ticks this GP) idle=7eb/0/0x1 softirq=118271/118271 fqs=0 last_accelerate: e3cd/71c0 dyntick_enabled: 1
[ 2886.490837] (t=2199034 jiffies g=185489 q=4)
[ 2886.497297] rcu: rcu_sched kthread timer wakeup didn't happen for 2199031 jiffies! g185489 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 2886.514201] rcu: Possible timer handling issue on cpu=0 timer-softirq=441616
[ 2886.524593] rcu: rcu_sched kthread starved for 2199034 jiffies! g185489 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 2886.540067] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 2886.551967] rcu: RCU grace-period kthread stack dump:
[ 2886.558644] task:rcu_sched       state:I stack:14896 pid:   15 ppid:     2 flags:0x00004000
[ 2886.569640] Call Trace:
[ 2886.572940]  <TASK>
[ 2886.575902]  __schedule+0x284/0x6e0
[ 2886.580969]  schedule+0x53/0xa0
[ 2886.585231]  schedule_timeout+0x8f/0x130

In that huge gap, I connect gdb and dumped those stacks in above link.

On 5.15 stable you could repro it in about an hour and a half most of the time by running something like:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 48 --duration 60 --configs TREE04

Let me know if you saw anything like this. I am currently trying to panic the
kernel when the hang happens so I can get better traces.

thanks,

 - Joel

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/9/23 13:14, Joel Fernandes wrote:
> On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
>> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
>>> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>
>>>> On 8/9/23 06:53, Joel Fernandes wrote:
>>>>> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
>>>>>> This is the start of the stable review cycle for the 5.15.126 release.
>>>>>> There are 92 patches in this series, all will be posted as a response
>>>>>> to this one.  If anyone has any issues with these being applied, please
>>>>>> let me know.
>>>>>>
>>>>>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>>>>>> Anything received after that time might be too late.
>>>>>>
>>>>>> The whole patch series can be found in one patch at:
>>>>>>       https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>>>>>> or in the git tree and branch at:
>>>>>>       git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>>>>>> and the diffstat can be found below.
>>>>>
>>>>> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
>>>>> hang with this -rc: TREE04, TREE07, TASKS03.
>>>>>
>>>>> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
>>>>> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
>>>>> issue does not show up on anything but 5.15 stable kernels and neither on
>>>>> mainline.
>>>>>
>>>>
>>>> Do you by any have a crash pattern that we could possibly use to find the crash
>>>> in ChromeOS crash logs ? No idea if that would help, but it could provide some
>>>> additional data points.
>>>
>>> The pattern shows as a hard hang, the system is unresponsive and all CPUs
>>> are stuck in stop_machine. Sometimes it recovers on its own from the
>>> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
>>> to reproduce and sometimes never happens for several hours.
>>>
>>> It appears related to CPU hotplug since gdb showed me most of the CPUs
>>> are spinning in multi_cpu_stop() / stop machine after the hang.
>>>
>>
>> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
>> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> 
> Interesting. It looks similar as far as the stack dump in gdb goes, here are
> the stacks I dumped with the hang I referred to:
> https://paste.debian.net/1288308/
> 

That link gives me "Entry not found".

Guenter

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 8/9/23 13:14, Joel Fernandes wrote:
> > On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> >> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> >>> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>>
> >>>> On 8/9/23 06:53, Joel Fernandes wrote:
> >>>>> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> >>>>>> This is the start of the stable review cycle for the 5.15.126 release.
> >>>>>> There are 92 patches in this series, all will be posted as a response
> >>>>>> to this one.  If anyone has any issues with these being applied, please
> >>>>>> let me know.
> >>>>>>
> >>>>>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> >>>>>> Anything received after that time might be too late.
> >>>>>>
> >>>>>> The whole patch series can be found in one patch at:
> >>>>>>       https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> >>>>>> or in the git tree and branch at:
> >>>>>>       git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> >>>>>> and the diffstat can be found below.
> >>>>>
> >>>>> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> >>>>> hang with this -rc: TREE04, TREE07, TASKS03.
> >>>>>
> >>>>> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> >>>>> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> >>>>> issue does not show up on anything but 5.15 stable kernels and neither on
> >>>>> mainline.
> >>>>>
> >>>>
> >>>> Do you by any have a crash pattern that we could possibly use to find the crash
> >>>> in ChromeOS crash logs ? No idea if that would help, but it could provide some
> >>>> additional data points.
> >>>
> >>> The pattern shows as a hard hang, the system is unresponsive and all CPUs
> >>> are stuck in stop_machine. Sometimes it recovers on its own from the
> >>> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> >>> to reproduce and sometimes never happens for several hours.
> >>>
> >>> It appears related to CPU hotplug since gdb showed me most of the CPUs
> >>> are spinning in multi_cpu_stop() / stop machine after the hang.
> >>>
> >>
> >> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> >> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> >
> > Interesting. It looks similar as far as the stack dump in gdb goes, here are
> > the stacks I dumped with the hang I referred to:
> > https://paste.debian.net/1288308/
> >
>
> That link gives me "Entry not found".

Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/9/23 13:39, Joel Fernandes wrote:
> On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 8/9/23 13:14, Joel Fernandes wrote:
>>> On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
>>>> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
>>>>> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>
>>>>>> On 8/9/23 06:53, Joel Fernandes wrote:
>>>>>>> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
>>>>>>>> This is the start of the stable review cycle for the 5.15.126 release.
>>>>>>>> There are 92 patches in this series, all will be posted as a response
>>>>>>>> to this one.  If anyone has any issues with these being applied, please
>>>>>>>> let me know.
>>>>>>>>
>>>>>>>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>>>>>>>> Anything received after that time might be too late.
>>>>>>>>
>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>>>>>>>> or in the git tree and branch at:
>>>>>>>>        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>>>>>>>> and the diffstat can be found below.
>>>>>>>
>>>>>>> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
>>>>>>> hang with this -rc: TREE04, TREE07, TASKS03.
>>>>>>>
>>>>>>> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
>>>>>>> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
>>>>>>> issue does not show up on anything but 5.15 stable kernels and neither on
>>>>>>> mainline.
>>>>>>>
>>>>>>
>>>>>> Do you by any have a crash pattern that we could possibly use to find the crash
>>>>>> in ChromeOS crash logs ? No idea if that would help, but it could provide some
>>>>>> additional data points.
>>>>>
>>>>> The pattern shows as a hard hang, the system is unresponsive and all CPUs
>>>>> are stuck in stop_machine. Sometimes it recovers on its own from the
>>>>> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
>>>>> to reproduce and sometimes never happens for several hours.
>>>>>
>>>>> It appears related to CPU hotplug since gdb showed me most of the CPUs
>>>>> are spinning in multi_cpu_stop() / stop machine after the hang.
>>>>>
>>>>
>>>> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
>>>> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
>>>
>>> Interesting. It looks similar as far as the stack dump in gdb goes, here are
>>> the stacks I dumped with the hang I referred to:
>>> https://paste.debian.net/1288308/
>>>
>>
>> That link gives me "Entry not found".
> 
> Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2

I found a couple of crash reports from chromeos-5.10, one of them complaining
about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.

Guenter

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Paul E. McKenney 2 years, 1 month ago
On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
> On 8/9/23 13:39, Joel Fernandes wrote:
> > On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > 
> > > On 8/9/23 13:14, Joel Fernandes wrote:
> > > > On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> > > > > On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> > > > > > On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > > > 
> > > > > > > On 8/9/23 06:53, Joel Fernandes wrote:
> > > > > > > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > > This is the start of the stable review cycle for the 5.15.126 release.
> > > > > > > > > There are 92 patches in this series, all will be posted as a response
> > > > > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > > > > let me know.
> > > > > > > > > 
> > > > > > > > > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > > > > > > > Anything received after that time might be too late.
> > > > > > > > > 
> > > > > > > > > The whole patch series can be found in one patch at:
> > > > > > > > >        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > > > > > > > > or in the git tree and branch at:
> > > > > > > > >        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > > > > > > and the diffstat can be found below.
> > > > > > > > 
> > > > > > > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > > > > > > hang with this -rc: TREE04, TREE07, TASKS03.
> > > > > > > > 
> > > > > > > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > > > > > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > > > > > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > > > > > > mainline.
> > > > > > > > 
> > > > > > > 
> > > > > > > Do you by any have a crash pattern that we could possibly use to find the crash
> > > > > > > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > > > > > > additional data points.
> > > > > > 
> > > > > > The pattern shows as a hard hang, the system is unresponsive and all CPUs
> > > > > > are stuck in stop_machine. Sometimes it recovers on its own from the
> > > > > > hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> > > > > > to reproduce and sometimes never happens for several hours.
> > > > > > 
> > > > > > It appears related to CPU hotplug since gdb showed me most of the CPUs
> > > > > > are spinning in multi_cpu_stop() / stop machine after the hang.
> > > > > > 
> > > > > 
> > > > > Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> > > > > but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> > > > 
> > > > Interesting. It looks similar as far as the stack dump in gdb goes, here are
> > > > the stacks I dumped with the hang I referred to:
> > > > https://paste.debian.net/1288308/
> > > > 
> > > 
> > > That link gives me "Entry not found".
> > 
> > Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
> 
> I found a couple of crash reports from chromeos-5.10, one of them complaining
> about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.

Is the crash showing the eternally refiring timer fixed by this commit?

53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")

This commit fixed something similar for me in v5.16.

	https://paulmck.livejournal.com/62071.html

							Thanx, Paul
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Thu, Aug 10, 2023 at 10:55:16AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
> > On 8/9/23 13:39, Joel Fernandes wrote:
> > > On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > 
> > > > On 8/9/23 13:14, Joel Fernandes wrote:
> > > > > On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> > > > > > On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > > > > 
> > > > > > > > On 8/9/23 06:53, Joel Fernandes wrote:
> > > > > > > > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > > > This is the start of the stable review cycle for the 5.15.126 release.
> > > > > > > > > > There are 92 patches in this series, all will be posted as a response
> > > > > > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > > > > > let me know.
> > > > > > > > > > 
> > > > > > > > > > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > > > > > > > > Anything received after that time might be too late.
> > > > > > > > > > 
> > > > > > > > > > The whole patch series can be found in one patch at:
> > > > > > > > > >        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > > > > > > > > > or in the git tree and branch at:
> > > > > > > > > >        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > > > > > > > and the diffstat can be found below.
> > > > > > > > > 
> > > > > > > > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > > > > > > > hang with this -rc: TREE04, TREE07, TASKS03.
> > > > > > > > > 
> > > > > > > > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > > > > > > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > > > > > > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > > > > > > > mainline.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Do you by any have a crash pattern that we could possibly use to find the crash
> > > > > > > > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > > > > > > > additional data points.
> > > > > > > 
> > > > > > > The pattern shows as a hard hang, the system is unresponsive and all CPUs
> > > > > > > are stuck in stop_machine. Sometimes it recovers on its own from the
> > > > > > > hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> > > > > > > to reproduce and sometimes never happens for several hours.
> > > > > > > 
> > > > > > > It appears related to CPU hotplug since gdb showed me most of the CPUs
> > > > > > > are spinning in multi_cpu_stop() / stop machine after the hang.
> > > > > > > 
> > > > > > 
> > > > > > Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> > > > > > but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> > > > > 
> > > > > Interesting. It looks similar as far as the stack dump in gdb goes, here are
> > > > > the stacks I dumped with the hang I referred to:
> > > > > https://paste.debian.net/1288308/
> > > > > 
> > > > 
> > > > That link gives me "Entry not found".
> > > 
> > > Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
> > 
> > I found a couple of crash reports from chromeos-5.10, one of them complaining
> > about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.
> 
> Is the crash showing the eternally refiring timer fixed by this commit?
> 
> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")

Ah I was just replying, I have been seeing really good results after applying
the following 3 commits since yesterday:

53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
5417ddc1cf1f ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
a1ff03cd6fb9 ("tick: Detect and fix jiffies update stall")

5417ddc1cf1f also mentioned a "tick storm" which is exactly what I was
seeing.

I did a lengthy test and everything is looking good. I'll send these out to
the stable list.

thanks,

 - Joel


Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Guenter Roeck 2 years, 1 month ago
On 8/10/23 14:54, Joel Fernandes wrote:
> On Thu, Aug 10, 2023 at 10:55:16AM -0700, Paul E. McKenney wrote:
>> On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
>>> On 8/9/23 13:39, Joel Fernandes wrote:
>>>> On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>
>>>>> On 8/9/23 13:14, Joel Fernandes wrote:
>>>>>> On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
>>>>>>> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
>>>>>>>> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>
>>>>>>>>> On 8/9/23 06:53, Joel Fernandes wrote:
>>>>>>>>>> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
>>>>>>>>>>> This is the start of the stable review cycle for the 5.15.126 release.
>>>>>>>>>>> There are 92 patches in this series, all will be posted as a response
>>>>>>>>>>> to this one.  If anyone has any issues with these being applied, please
>>>>>>>>>>> let me know.
>>>>>>>>>>>
>>>>>>>>>>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>>>>>>>>>>> Anything received after that time might be too late.
>>>>>>>>>>>
>>>>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>>>>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>>>>>>>>>>> or in the git tree and branch at:
>>>>>>>>>>>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>>>>>>>>>>> and the diffstat can be found below.
>>>>>>>>>>
>>>>>>>>>> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
>>>>>>>>>> hang with this -rc: TREE04, TREE07, TASKS03.
>>>>>>>>>>
>>>>>>>>>> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
>>>>>>>>>> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
>>>>>>>>>> issue does not show up on anything but 5.15 stable kernels and neither on
>>>>>>>>>> mainline.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you by any have a crash pattern that we could possibly use to find the crash
>>>>>>>>> in ChromeOS crash logs ? No idea if that would help, but it could provide some
>>>>>>>>> additional data points.
>>>>>>>>
>>>>>>>> The pattern shows as a hard hang, the system is unresponsive and all CPUs
>>>>>>>> are stuck in stop_machine. Sometimes it recovers on its own from the
>>>>>>>> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
>>>>>>>> to reproduce and sometimes never happens for several hours.
>>>>>>>>
>>>>>>>> It appears related to CPU hotplug since gdb showed me most of the CPUs
>>>>>>>> are spinning in multi_cpu_stop() / stop machine after the hang.
>>>>>>>>
>>>>>>>
>>>>>>> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
>>>>>>> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
>>>>>>
>>>>>> Interesting. It looks similar as far as the stack dump in gdb goes, here are
>>>>>> the stacks I dumped with the hang I referred to:
>>>>>> https://paste.debian.net/1288308/
>>>>>>
>>>>>
>>>>> That link gives me "Entry not found".
>>>>
>>>> Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
>>>
>>> I found a couple of crash reports from chromeos-5.10, one of them complaining
>>> about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.
>>
>> Is the crash showing the eternally refiring timer fixed by this commit?
>>
>> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> 
> Ah I was just replying, I have been seeing really good results after applying
> the following 3 commits since yesterday:
> 
> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> 5417ddc1cf1f ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
> a1ff03cd6fb9 ("tick: Detect and fix jiffies update stall")
> 

Would those also apply to v5.10.y, or just 5.15.y ?

Thanks,
Guenter

> 5417ddc1cf1f also mentioned a "tick storm" which is exactly what I was
> seeing.
> 
> I did a lengthy test and everything is looking good. I'll send these out to
> the stable list.
> 
> thanks,
> 
>   - Joel
> 
> 

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago

> On Aug 10, 2023, at 6:55 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> On 8/10/23 14:54, Joel Fernandes wrote:
>>> On Thu, Aug 10, 2023 at 10:55:16AM -0700, Paul E. McKenney wrote:
>>> On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
>>>> On 8/9/23 13:39, Joel Fernandes wrote:
>>>>> On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>> 
>>>>>> On 8/9/23 13:14, Joel Fernandes wrote:
>>>>>>> On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
>>>>>>>> On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
>>>>>>>>> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 8/9/23 06:53, Joel Fernandes wrote:
>>>>>>>>>>> On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
>>>>>>>>>>>> This is the start of the stable review cycle for the 5.15.126 release.
>>>>>>>>>>>> There are 92 patches in this series, all will be posted as a response
>>>>>>>>>>>> to this one.  If anyone has any issues with these being applied, please
>>>>>>>>>>>> let me know.
>>>>>>>>>>>> 
>>>>>>>>>>>> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
>>>>>>>>>>>> Anything received after that time might be too late.
>>>>>>>>>>>> 
>>>>>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>>>>>        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
>>>>>>>>>>>> or in the git tree and branch at:
>>>>>>>>>>>>        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>>>>>>>>>>>> and the diffstat can be found below.
>>>>>>>>>>> 
>>>>>>>>>>> Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
>>>>>>>>>>> hang with this -rc: TREE04, TREE07, TASKS03.
>>>>>>>>>>> 
>>>>>>>>>>> 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
>>>>>>>>>>> hotplug rcutorture testing. Me and tglx are continuing to debug this. The
>>>>>>>>>>> issue does not show up on anything but 5.15 stable kernels and neither on
>>>>>>>>>>> mainline.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Do you by any have a crash pattern that we could possibly use to find the crash
>>>>>>>>>> in ChromeOS crash logs ? No idea if that would help, but it could provide some
>>>>>>>>>> additional data points.
>>>>>>>>> 
>>>>>>>>> The pattern shows as a hard hang, the system is unresponsive and all CPUs
>>>>>>>>> are stuck in stop_machine. Sometimes it recovers on its own from the
>>>>>>>>> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
>>>>>>>>> to reproduce and sometimes never happens for several hours.
>>>>>>>>> 
>>>>>>>>> It appears related to CPU hotplug since gdb showed me most of the CPUs
>>>>>>>>> are spinning in multi_cpu_stop() / stop machine after the hang.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
>>>>>>>> but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
>>>>>>> 
>>>>>>> Interesting. It looks similar as far as the stack dump in gdb goes, here are
>>>>>>> the stacks I dumped with the hang I referred to:
>>>>>>> https://paste.debian.net/1288308/
>>>>>>> 
>>>>>> 
>>>>>> That link gives me "Entry not found".
>>>>> 
>>>>> Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
>>>> 
>>>> I found a couple of crash reports from chromeos-5.10, one of them complaining
>>>> about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.
>>> 
>>> Is the crash showing the eternally refiring timer fixed by this commit?
>>> 
>>> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
>> Ah I was just replying, I have been seeing really good results after applying
>> the following 3 commits since yesterday:
>> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
>> 5417ddc1cf1f ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
>> a1ff03cd6fb9 ("tick: Detect and fix jiffies update stall")
> 
> Would those also apply to v5.10.y, or just 5.15.y ?

All apply to 5.10 but one. I am currently testing with it more and will post to stable for 5.10 as well.

Thanks,

 - Joel



> 
> Thanks,
> Guenter
> 
>> 5417ddc1cf1f also mentioned a "tick storm" which is exactly what I was
>> seeing.
>> I did a lengthy test and everything is looking good. I'll send these out to
>> the stable list.
>> thanks,
>>  - Joel
> 
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Thu, Aug 10, 2023 at 09:54:16PM +0000, Joel Fernandes wrote:
> On Thu, Aug 10, 2023 at 10:55:16AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
> > > On 8/9/23 13:39, Joel Fernandes wrote:
> > > > On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > 
> > > > > On 8/9/23 13:14, Joel Fernandes wrote:
> > > > > > On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> > > > > > > On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > > > > > 
> > > > > > > > > On 8/9/23 06:53, Joel Fernandes wrote:
> > > > > > > > > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > > > > This is the start of the stable review cycle for the 5.15.126 release.
> > > > > > > > > > > There are 92 patches in this series, all will be posted as a response
> > > > > > > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > > > > > > let me know.
> > > > > > > > > > > 
> > > > > > > > > > > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > > > > > > > > > Anything received after that time might be too late.
> > > > > > > > > > > 
> > > > > > > > > > > The whole patch series can be found in one patch at:
> > > > > > > > > > >        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > > > > > > > > > > or in the git tree and branch at:
> > > > > > > > > > >        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > > > > > > > > and the diffstat can be found below.
> > > > > > > > > > 
> > > > > > > > > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > > > > > > > > hang with this -rc: TREE04, TREE07, TASKS03.
> > > > > > > > > > 
> > > > > > > > > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > > > > > > > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > > > > > > > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > > > > > > > > mainline.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Do you by any have a crash pattern that we could possibly use to find the crash
> > > > > > > > > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > > > > > > > > additional data points.
> > > > > > > > 
> > > > > > > > The pattern shows as a hard hang, the system is unresponsive and all CPUs
> > > > > > > > are stuck in stop_machine. Sometimes it recovers on its own from the
> > > > > > > > hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> > > > > > > > to reproduce and sometimes never happens for several hours.
> > > > > > > > 
> > > > > > > > It appears related to CPU hotplug since gdb showed me most of the CPUs
> > > > > > > > are spinning in multi_cpu_stop() / stop machine after the hang.
> > > > > > > > 
> > > > > > > 
> > > > > > > Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> > > > > > > but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> > > > > > 
> > > > > > Interesting. It looks similar as far as the stack dump in gdb goes, here are
> > > > > > the stacks I dumped with the hang I referred to:
> > > > > > https://paste.debian.net/1288308/
> > > > > > 
> > > > > 
> > > > > That link gives me "Entry not found".
> > > > 
> > > > Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
> > > 
> > > I found a couple of crash reports from chromeos-5.10, one of them complaining
> > > about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.
> > 
> > Is the crash showing the eternally refiring timer fixed by this commit?
> > 
> > 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> 
> Ah I was just replying, I have been seeing really good results after applying
> the following 3 commits since yesterday:
> 
> 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> 5417ddc1cf1f ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
> a1ff03cd6fb9 ("tick: Detect and fix jiffies update stall")
> 
> 5417ddc1cf1f also mentioned a "tick storm" which is exactly what I was
> seeing.
> 
> I did a lengthy test and everything is looking good. I'll send these out to
> the stable list.

I just read your post for the first time. And just to humor you about my
debugging which was very similar to yours, I got as far as this statement in
your post (before looking for fixes in timer code):
<quote>
Further checking showed that the stuck CPU was in fact suffering from an
interrupt storm, namely an interrupt storm of scheduling-clock interrupts.
This spurred another code-inspection session.
</quote>

My detection of this came from gdb, within that 2000 second stall, I broke
into the VM with --gdb and kept dumping the stuck CPU's stack with "thread X"
and "bt". I noticed that it was always in the timer interrupt. Here were the
stacks: https://pastebin.com/raw/L3nv1kH2

Then I narrowed my search down to timer events by enabling
boot options ftrace_dump_on_oops and panic-on-stall ones, and noticed a storm
of hrtimer_start coming out of the long stall. I was all but certain it was a
tick storm and noticed it kept programming hrtimer to the same event.

Ah, then I just did a "git diff" in kernel/time/ between v5.15 and v6.1 and
noticed the missing patches. ;-)

Though in my experience, I wasn't seeing a KTIME_MAX-type of value like you
mentioned in the post. What I noticed is that the tick was never stopped, it
just kept firing a bit earlier than was requested and in the interrupt exit
path (of the delivered-too-early timer interrupt), it kept re-requesting the
tick.

thanks,

 - Joel

Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Paul E. McKenney 2 years, 1 month ago
On Thu, Aug 10, 2023 at 10:14:16PM +0000, Joel Fernandes wrote:
> On Thu, Aug 10, 2023 at 09:54:16PM +0000, Joel Fernandes wrote:
> > On Thu, Aug 10, 2023 at 10:55:16AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 09, 2023 at 02:45:44PM -0700, Guenter Roeck wrote:
> > > > On 8/9/23 13:39, Joel Fernandes wrote:
> > > > > On Wed, Aug 9, 2023 at 4:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > > 
> > > > > > On 8/9/23 13:14, Joel Fernandes wrote:
> > > > > > > On Wed, Aug 09, 2023 at 12:25:48PM -0700, Guenter Roeck wrote:
> > > > > > > > On Wed, Aug 09, 2023 at 02:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > > > > > > 
> > > > > > > > > > On 8/9/23 06:53, Joel Fernandes wrote:
> > > > > > > > > > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > > > > > This is the start of the stable review cycle for the 5.15.126 release.
> > > > > > > > > > > > There are 92 patches in this series, all will be posted as a response
> > > > > > > > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > > > > > > > let me know.
> > > > > > > > > > > > 
> > > > > > > > > > > > Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > > > > > > > > > > > Anything received after that time might be too late.
> > > > > > > > > > > > 
> > > > > > > > > > > > The whole patch series can be found in one patch at:
> > > > > > > > > > > >        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > > > > > > > > > > > or in the git tree and branch at:
> > > > > > > > > > > >        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > > > > > > > > > and the diffstat can be found below.
> > > > > > > > > > > 
> > > > > > > > > > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > > > > > > > > > hang with this -rc: TREE04, TREE07, TASKS03.
> > > > > > > > > > > 
> > > > > > > > > > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > > > > > > > > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > > > > > > > > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > > > > > > > > > mainline.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Do you by any have a crash pattern that we could possibly use to find the crash
> > > > > > > > > > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > > > > > > > > > additional data points.
> > > > > > > > > 
> > > > > > > > > The pattern shows as a hard hang, the system is unresponsive and all CPUs
> > > > > > > > > are stuck in stop_machine. Sometimes it recovers on its own from the
> > > > > > > > > hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> > > > > > > > > to reproduce and sometimes never happens for several hours.
> > > > > > > > > 
> > > > > > > > > It appears related to CPU hotplug since gdb showed me most of the CPUs
> > > > > > > > > are spinning in multi_cpu_stop() / stop machine after the hang.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Hmm, we do see lots of soft lockups with multi_cpu_stop() in the backtrace,
> > > > > > > > but not with v5.15.y but with v5.4.y. The actual hang is in stop_machine_yield().
> > > > > > > 
> > > > > > > Interesting. It looks similar as far as the stack dump in gdb goes, here are
> > > > > > > the stacks I dumped with the hang I referred to:
> > > > > > > https://paste.debian.net/1288308/
> > > > > > > 
> > > > > > 
> > > > > > That link gives me "Entry not found".
> > > > > 
> > > > > Yeah that was weird. Here it is again: https://pastebin.com/raw/L3nv1kH2
> > > > 
> > > > I found a couple of crash reports from chromeos-5.10, one of them complaining
> > > > about RCU issues. I sent you links via IM. Nothing from 5.15 or later, though.
> > > 
> > > Is the crash showing the eternally refiring timer fixed by this commit?
> > > 
> > > 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> > 
> > Ah I was just replying, I have been seeing really good results after applying
> > the following 3 commits since yesterday:
> > 
> > 53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full IRQ entry")
> > 5417ddc1cf1f ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
> > a1ff03cd6fb9 ("tick: Detect and fix jiffies update stall")
> > 
> > 5417ddc1cf1f also mentioned a "tick storm" which is exactly what I was
> > seeing.
> > 
> > I did a lengthy test and everything is looking good. I'll send these out to
> > the stable list.
> 
> I just read your post for the first time. And just to humor you about my
> debugging which was very similar to yours, I got as far as this statement in
> your post (before looking for fixes in timer code):
> <quote>
> Further checking showed that the stuck CPU was in fact suffering from an
> interrupt storm, namely an interrupt storm of scheduling-clock interrupts.
> This spurred another code-inspection session.
> </quote>
> 
> My detection of this came from gdb, within that 2000 second stall, I broke
> into the VM with --gdb and kept dumping the stuck CPU's stack with "thread X"
> and "bt". I noticed that it was always in the timer interrupt. Here were the
> stacks: https://pastebin.com/raw/L3nv1kH2
> 
> Then I narrowed my search down to timer events by enabling
> boot options ftrace_dump_on_oops and panic-on-stall ones, and noticed a storm
> of hrtimer_start coming out of the long stall. I was all but certain it was a
> tick storm and noticed it kept programming hrtimer to the same event.
> 
> Ah, then I just did a "git diff" in kernel/time/ between v5.15 and v6.1 and
> noticed the missing patches. ;-)
> 
> Though in my experience, I wasn't seeing a KTIME_MAX-type of value like you
> mentioned in the post. What I noticed is that the tick was never stopped, it
> just kept firing a bit earlier than was requested and in the interrupt exit
> path (of the delivered-too-early timer interrupt), it kept re-requesting the
> tick.

That "git diff" wouldn't have shown me much at the time, but I am very
glad that you found it!

							Thanx, Paul
Re: [PATCH 5.15 00/92] 5.15.126-rc1 review
Posted by Joel Fernandes 2 years, 1 month ago
On Wed, Aug 9, 2023 at 2:35 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Wed, Aug 9, 2023 at 12:18 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 8/9/23 06:53, Joel Fernandes wrote:
> > > On Wed, Aug 09, 2023 at 12:40:36PM +0200, Greg Kroah-Hartman wrote:
> > >> This is the start of the stable review cycle for the 5.15.126 release.
> > >> There are 92 patches in this series, all will be posted as a response
> > >> to this one.  If anyone has any issues with these being applied, please
> > >> let me know.
> > >>
> > >> Responses should be made by Fri, 11 Aug 2023 10:36:10 +0000.
> > >> Anything received after that time might be too late.
> > >>
> > >> The whole patch series can be found in one patch at:
> > >>      https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.126-rc1.gz
> > >> or in the git tree and branch at:
> > >>      git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > >> and the diffstat can be found below.
> > >
> > > Not necesscarily new with 5.15 stable but 3 of the 19 rcutorture scenarios
> > > hang with this -rc: TREE04, TREE07, TASKS03.
> > >
> > > 5.15 has a known stop machine issue where it hangs after 1.5 hours with cpu
> > > hotplug rcutorture testing. Me and tglx are continuing to debug this. The
> > > issue does not show up on anything but 5.15 stable kernels and neither on
> > > mainline.
> > >
> >
> > Do you by any have a crash pattern that we could possibly use to find the crash
> > in ChromeOS crash logs ? No idea if that would help, but it could provide some
> > additional data points.
>
> The pattern shows as a hard hang, the system is unresponsive and all CPUs
> are stuck in stop_machine. Sometimes it recovers on its own from the
> hang and then RCU immediately gives stall warnings. It takes 1.5 hour
> to reproduce and sometimes never happens for several hours.
>
> It appears related to CPU hotplug since gdb showed me most of the CPUs
> are spinning in multi_cpu_stop() / stop machine after the hang.
>

Adding to this, it appears one of the CPUs is constantly firing and
reprogramming hrtimer events for some reason every few 100
microseconds (I see this in gdb). My debug angle right now is to
figure out why it does that but collecting a trace is hard as it
appears even trace collection may not be happening once hung and the
only traces I am getting are the ones after the hang recovers, not
during the hang.  I am also trying to see if multi_cpu_stop() can
panic the kernel if it sits there too long.

 - Joel