From nobody Fri Nov 29 18:39:09 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1632853666; cv=none; d=zohomail.com; s=zohoarc; b=C36IacDB2zQeC9pv4kpda0NrYH2y7sZHIfhUb4ozvddLlt36my8iudHXk4fqT2Q/1UrUmqWLQgSYqbPgeFDm1I+Y/vgmgtsSJHCBHG/BUckvY7wRu9tDFQuB/gWBdWdixLaQHHRF1Oj4YUv21h8zkqXvEBUIdUkPy0y7s7eH71c= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1632853666; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=YxVxDaMF8Dvr5dJnhnKtXBDkasxDLaxdtEfkC3HanCE=; b=XcEqE3yqFOKzJAOxJiEoyBwSp6u7+6BUITLjqGmmy6YpOPTratkTF41fkDMoOzvEtleHv+Nsg8yo6dmBPcJ1QYC6ekwZqXwVAYQJtigzv6c1M7sBZA/mtvldkCs2OIia6t+WyBGkp+BABMaZ/k/3xTGMjaBf3oRreEWnlOJL9H4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1632853666538347.9606195429004; Tue, 28 Sep 2021 11:27:46 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.198358.351817 (Exim 4.92) (envelope-from ) id 1mVHpE-00040M-Mo; Tue, 28 Sep 2021 18:27:32 +0000 Received: by outflank-mailman (output) from mailman id 198358.351817; Tue, 28 Sep 2021 18:27:32 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mVHpE-00040F-JT; Tue, 28 Sep 2021 18:27:32 +0000 Received: by outflank-mailman (input) for mailman id 198358; Tue, 28 Sep 2021 18:27:32 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mVHmc-0007ni-MI for xen-devel@lists.xenproject.org; Tue, 28 Sep 2021 18:24:50 +0000 Received: from us-smtp-delivery-124.mimecast.com (unknown [216.205.24.124]) by us1-rack-iad1.inumbo.com (Halon) with ESMTP id 1c185ed4-4f73-4c6c-bb5b-2b88a20728ba; Tue, 28 Sep 2021 18:24:10 +0000 (UTC) Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-449-rpm4hXWDPGGhvyyl4qPVLg-1; Tue, 28 Sep 2021 14:24:08 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F05E41B2C980; Tue, 28 Sep 2021 18:24:05 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.120]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6147C60854; Tue, 28 Sep 2021 18:23:50 +0000 (UTC) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 1c185ed4-4f73-4c6c-bb5b-2b88a20728ba DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632853449; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YxVxDaMF8Dvr5dJnhnKtXBDkasxDLaxdtEfkC3HanCE=; b=fMzU62lrwd+Yy9xtxXsot4PJFkC5SzASmXH8wLg35TY8j9GvgYOC/v/IhuwUsVIKi/1t95 oY54T9sno+VXRkBACZcnW0+wFM4GeQ4Z9yqlxMZGRYRQjP4Fd1bH4l1Z+zUSQNzljDYHnW U7za+EJ2Qve+lppsIFlqoDCR0W/ANNk= X-MC-Unique: rpm4hXWDPGGhvyyl4qPVLg-1 From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , "Michael S. Tsirkin" , Jason Wang , Dave Young , Baoquan He , Vivek Goyal , Michal Hocko , Oscar Salvador , Mike Rapoport , "Rafael J. Wysocki" , x86@kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, kexec@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 4/8] proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks Date: Tue, 28 Sep 2021 20:22:54 +0200 Message-Id: <20210928182258.12451-5-david@redhat.com> In-Reply-To: <20210928182258.12451-1-david@redhat.com> References: <20210928182258.12451-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1632853668504100001 Content-Type: text/plain; charset="utf-8" Let's support multiple registered callbacks, making sure that registering vmcore callbacks cannot fail. Make the callback return a bool instead of an int, handling how to deal with errors internally. Drop unused HAVE_OLDMEM_PFN_IS_RAM. We soon want to make use of this infrastructure from other drivers: virtio-mem, registering one callback for each virtio-mem device, to prevent reading unplugged virtio-mem memory. Handle it via a generic vmcore_cb structure, prepared for future extensions: for example, once we support virtio-mem on s390x where the vmcore is completely constructed in the second kernel, we want to detect and add plugged virtio-mem memory ranges to the vmcore in order for them to get dumped properly. Handle corner cases that are unexpected and shouldn't happen in sane setups: registering a callback after the vmcore has already been opened (warn only) and unregistering a callback after the vmcore has already been opened (warn and essentially read only zeroes from that point on). Signed-off-by: David Hildenbrand --- arch/x86/kernel/aperture_64.c | 13 ++++- arch/x86/xen/mmu_hvm.c | 15 +++--- fs/proc/vmcore.c | 99 ++++++++++++++++++++++++----------- include/linux/crash_dump.h | 26 +++++++-- 4 files changed, 113 insertions(+), 40 deletions(-) diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c index 10562885f5fc..af3ba08b684b 100644 --- a/arch/x86/kernel/aperture_64.c +++ b/arch/x86/kernel/aperture_64.c @@ -73,12 +73,23 @@ static int gart_mem_pfn_is_ram(unsigned long pfn) (pfn >=3D aperture_pfn_start + aperture_page_count)); } =20 +#ifdef CONFIG_PROC_VMCORE +static bool gart_oldmem_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn) +{ + return !!gart_mem_pfn_is_ram(pfn); +} + +static struct vmcore_cb gart_vmcore_cb =3D { + .pfn_is_ram =3D gart_oldmem_pfn_is_ram, +}; +#endif + static void __init exclude_from_core(u64 aper_base, u32 aper_order) { aperture_pfn_start =3D aper_base >> PAGE_SHIFT; aperture_page_count =3D (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT; #ifdef CONFIG_PROC_VMCORE - WARN_ON(register_oldmem_pfn_is_ram(&gart_mem_pfn_is_ram)); + register_vmcore_cb(&gart_vmcore_cb); #endif #ifdef CONFIG_PROC_KCORE WARN_ON(register_mem_pfn_is_ram(&gart_mem_pfn_is_ram)); diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c index eb61622df75b..49bd4a6a5858 100644 --- a/arch/x86/xen/mmu_hvm.c +++ b/arch/x86/xen/mmu_hvm.c @@ -12,10 +12,10 @@ * The kdump kernel has to check whether a pfn of the crashed kernel * was a ballooned page. vmcore is using this function to decide * whether to access a pfn of the crashed kernel. - * Returns 0 if the pfn is not backed by a RAM page, the caller may + * Returns "false" if the pfn is not backed by a RAM page, the caller may * handle the pfn special in this case. */ -static int xen_oldmem_pfn_is_ram(unsigned long pfn) +static bool xen_vmcore_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn) { struct xen_hvm_get_mem_type a =3D { .domid =3D DOMID_SELF, @@ -23,15 +23,18 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn) }; =20 if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a)) - return -ENXIO; + return true; =20 switch (a.mem_type) { case HVMMEM_mmio_dm: - return 0; + return false; default: - return 1; + return true; } } +static struct vmcore_cb xen_vmcore_cb =3D { + .pfn_is_ram =3D xen_vmcore_pfn_is_ram, +}; #endif =20 static void xen_hvm_exit_mmap(struct mm_struct *mm) @@ -65,6 +68,6 @@ void __init xen_hvm_init_mmu_ops(void) if (is_pagetable_dying_supported()) pv_ops.mmu.exit_mmap =3D xen_hvm_exit_mmap; #ifdef CONFIG_PROC_VMCORE - WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram)); + register_vmcore_cb(&xen_vmcore_cb); #endif } diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index a9bd80ab670e..7a04b2eca287 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -62,46 +62,75 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0); /* Device Dump Size */ static size_t vmcoredd_orig_sz; =20 -/* - * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error - * The called function has to take care of module refcounting. - */ -static int (*oldmem_pfn_is_ram)(unsigned long pfn); - -int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn)) +static DECLARE_RWSEM(vmcore_cb_rwsem); +/* List of registered vmcore callbacks. */ +static LIST_HEAD(vmcore_cb_list); +/* Whether we had a surprise unregistration of a callback. */ +static bool vmcore_cb_unstable; +/* Whether the vmcore has been opened once. */ +static bool vmcore_opened; + +void register_vmcore_cb(struct vmcore_cb *cb) { - if (oldmem_pfn_is_ram) - return -EBUSY; - oldmem_pfn_is_ram =3D fn; - return 0; + down_write(&vmcore_cb_rwsem); + INIT_LIST_HEAD(&cb->next); + list_add_tail(&cb->next, &vmcore_cb_list); + /* + * Registering a vmcore callback after the vmcore was opened is + * very unusual (e.g., manual driver loading). + */ + if (vmcore_opened) + pr_warn_once("Unexpected vmcore callback registration\n"); + up_write(&vmcore_cb_rwsem); } -EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram); +EXPORT_SYMBOL_GPL(register_vmcore_cb); =20 -void unregister_oldmem_pfn_is_ram(void) +void unregister_vmcore_cb(struct vmcore_cb *cb) { - oldmem_pfn_is_ram =3D NULL; - wmb(); + down_write(&vmcore_cb_rwsem); + list_del(&cb->next); + /* + * Unregistering a vmcore callback after the vmcore was opened is + * very unusual (e.g., forced driver removal), but we cannot stop + * unregistering. + */ + if (vmcore_opened) { + pr_warn_once("Unexpected vmcore callback unregistration\n"); + vmcore_cb_unstable =3D true; + } + up_write(&vmcore_cb_rwsem); } -EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram); +EXPORT_SYMBOL_GPL(unregister_vmcore_cb); =20 static bool pfn_is_ram(unsigned long pfn) { - int (*fn)(unsigned long pfn); - /* pfn is ram unless fn() checks pagetype */ + struct vmcore_cb *cb; bool ret =3D true; =20 - /* - * Ask hypervisor if the pfn is really ram. - * A ballooned page contains no data and reading from such a page - * will cause high load in the hypervisor. - */ - fn =3D oldmem_pfn_is_ram; - if (fn) - ret =3D !!fn(pfn); + lockdep_assert_held_read(&vmcore_cb_rwsem); + if (unlikely(vmcore_cb_unstable)) + return false; + + list_for_each_entry(cb, &vmcore_cb_list, next) { + if (unlikely(!cb->pfn_is_ram)) + continue; + ret =3D cb->pfn_is_ram(cb, pfn); + if (!ret) + break; + } =20 return ret; } =20 +static int open_vmcore(struct inode *inode, struct file *file) +{ + down_read(&vmcore_cb_rwsem); + vmcore_opened =3D true; + up_read(&vmcore_cb_rwsem); + + return 0; +} + /* Reads a page from the oldmem device from given offset. */ ssize_t read_from_oldmem(char *buf, size_t count, u64 *ppos, int userbuf, @@ -117,6 +146,7 @@ ssize_t read_from_oldmem(char *buf, size_t count, offset =3D (unsigned long)(*ppos % PAGE_SIZE); pfn =3D (unsigned long)(*ppos / PAGE_SIZE); =20 + down_read(&vmcore_cb_rwsem); do { if (count > (PAGE_SIZE - offset)) nr_bytes =3D PAGE_SIZE - offset; @@ -136,8 +166,10 @@ ssize_t read_from_oldmem(char *buf, size_t count, tmp =3D copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf); =20 - if (tmp < 0) + if (tmp < 0) { + up_read(&vmcore_cb_rwsem); return tmp; + } } *ppos +=3D nr_bytes; count -=3D nr_bytes; @@ -147,6 +179,7 @@ ssize_t read_from_oldmem(char *buf, size_t count, offset =3D 0; } while (count); =20 + up_read(&vmcore_cb_rwsem); return read; } =20 @@ -537,14 +570,19 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_str= uct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot) { + int ret; + /* * Check if oldmem_pfn_is_ram was registered to avoid * looping over all pages without a reason. */ - if (oldmem_pfn_is_ram) - return remap_oldmem_pfn_checked(vma, from, pfn, size, prot); + down_read(&vmcore_cb_rwsem); + if (!list_empty(&vmcore_cb_list) || vmcore_cb_unstable) + ret =3D remap_oldmem_pfn_checked(vma, from, pfn, size, prot); else - return remap_oldmem_pfn_range(vma, from, pfn, size, prot); + ret =3D remap_oldmem_pfn_range(vma, from, pfn, size, prot); + up_read(&vmcore_cb_rwsem); + return ret; } =20 static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) @@ -668,6 +706,7 @@ static int mmap_vmcore(struct file *file, struct vm_are= a_struct *vma) #endif =20 static const struct proc_ops vmcore_proc_ops =3D { + .proc_open =3D open_vmcore, .proc_read =3D read_vmcore, .proc_lseek =3D default_llseek, .proc_mmap =3D mmap_vmcore, diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 2618577a4d6d..0c547d866f1e 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -91,9 +91,29 @@ static inline void vmcore_unusable(void) elfcorehdr_addr =3D ELFCORE_ADDR_ERR; } =20 -#define HAVE_OLDMEM_PFN_IS_RAM 1 -extern int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn)); -extern void unregister_oldmem_pfn_is_ram(void); +/** + * struct vmcore_cb - driver callbacks for /proc/vmcore handling + * @pfn_is_ram: check whether a PFN really is RAM and should be accessed w= hen + * reading the vmcore. Will return "true" if it is RAM or if = the + * callback cannot tell. If any callback returns "false", it'= s not + * RAM and the page must not be accessed; zeroes should be + * indicated in the vmcore instead. For example, a ballooned = page + * contains no data and reading from such a page will cause h= igh + * load in the hypervisor. + * @next: List head to manage registered callbacks internally; initialized= by + * register_vmcore_cb(). + * + * vmcore callbacks allow drivers managing physical memory ranges to + * coordinate with vmcore handling code, for example, to prevent accessing + * physical memory ranges that should not be accessed when reading the vmc= ore, + * although included in the vmcore header as memory ranges to dump. + */ +struct vmcore_cb { + bool (*pfn_is_ram)(struct vmcore_cb *cb, unsigned long pfn); + struct list_head next; +}; +extern void register_vmcore_cb(struct vmcore_cb *cb); +extern void unregister_vmcore_cb(struct vmcore_cb *cb); =20 #else /* !CONFIG_CRASH_DUMP */ static inline bool is_kdump_kernel(void) { return 0; } --=20 2.31.1