From nobody Fri Dec 19 20:50:59 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0460A325495 for ; Wed, 15 Oct 2025 18:36:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553419; cv=none; b=MbWD3jytZpDc6JXXLycLOB7tcf3burO7NLzRpooul/41elGRIUAJMl4gaqQxCkYP9wQWauzrJW7ewaydN9qOHEWL8PB5kngfzzntGsFMsCja28z/kRS52DEZTPaW/1z/lugiQ5mIkLJY8Omv2icKEOBLt5IMrJpfysd2MpkO0KM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553419; c=relaxed/simple; bh=HB6Vqw+7ydLWXhhSbVTw/E6GHFLSYol5gBxI60dhSzI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cRcr8LSDU30JgmnrH9KBpyw+/b1szqJ19OX7ds/qZ4zi+laRgpD+aulOD5fozTXNFoevKe4BsoCIAMxP2niQF5zWoMiIbh6L6fz0IFQ2JEo8wxI6WptdcjtzAvHDs/BcwKVXn85X8wKgzi6/MvS7nyCiiGZRhsAivKWPvqW9kqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=o+G6nPHc; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="o+G6nPHc" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b55118e2d01so8541732a12.1 for ; Wed, 15 Oct 2025 11:36:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760553417; x=1761158217; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WKpGbUmSzOj72WrP0EnxSLypc0IcqdMmnVFCtJZuxlU=; b=o+G6nPHcjcWWadM3g5SS3Cqv66ZKbivDPfTj0kdazFzaRry8B0WiGhpO++wmhEkbKn 9CMnGbOlAQCAJMeHKTk8HcI5gBdJ64ZEEbmKOa7OxfnVtQjhApcKgDVq5c3aOMHtMn8B lvP42SqCjcjIn/BRSWw3k9S4tt+wolYehvq61YR6JPVKCeT5oqF+wMfozywBUwjM7TGB C1tp92JHFVa/sovUQr2CjwHvoltE3t7Wa2YHNi+LRM4+inkv01RqW0bo3VgZZ2GQMNFt YfSvocudUk5ksmWRER2UKl/OtldTw/JRR95V3HZMq7xUbtnMg6Khv+waL52j2hM3Gjen J9dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760553417; x=1761158217; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WKpGbUmSzOj72WrP0EnxSLypc0IcqdMmnVFCtJZuxlU=; b=n6EiBCOuEOc9n5mizq3BrIt8bWMQl2k7Sds68aV7L50W7vK+FCyWjtrEAjYqylzwEj ZAZhsx9Le1+ttztKv+3yEEtU96nu40pOZw/7IynCbbuMQXL74QeOmjXr1iAInIAAyH1V Lc3P6wW/LySFFn7DK83sJeQXij5qd8DaEhSSiSqAeYz5t53PYkSSIi2plpHrYNsktQIp gxPIYLwa3+GD/k3z2AtgMdlVdx5jXMG22mC6E75MX1ZnGlG8Y50TahWhQqESWS6mE7vb cq1dYRijykCfKqb2LqNQ/3ASnBF4eGSmMlBXoVKwB9ODnRb+s/YOybQtbaKhHvH781pd eDLA== X-Forwarded-Encrypted: i=1; AJvYcCUNhjx5XYIfXF6cNR91eOAhbztyCdG+lFV32bTFUqzK7a1uHnxs/0ci4GCRjQlO+QZD2rPlMe2a+dZ81fE=@vger.kernel.org X-Gm-Message-State: AOJu0YwYXP13dKK5wZEbeijcu+6A8yWJV3qy0VKKUa6xqq7I4SpGGwAQ JUy7dTP+C/xyCRyKU3UmFCbT/OyafD10yGmP9wZyQeabe5sCFSghhb1JxrnHWnKP3c5DYpp3PH4 7rEjQ1A== X-Google-Smtp-Source: AGHT+IEYhWHuql5g9/FhU8JjSG5gGZZVyTNmxSEqKWgRSuyQCnr845g5xH8NjCRtjZ4IdBmCOokrTClIDdk= X-Received: from pjg7.prod.google.com ([2002:a17:90b:3f47:b0:330:7dd8:2dc2]) (user=wyihan job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6d8b:b0:2b1:c9dc:6db1 with SMTP id adf61e73a8af0-32da813457bmr38309016637.12.1760553417387; Wed, 15 Oct 2025 11:36:57 -0700 (PDT) Date: Wed, 15 Oct 2025 18:35:52 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.51.0.788.g6d19910ace-goog Message-ID: <57ed0bcbcfcec6fda89d60727467d7bd621c95ab.1760551864.git.wyihan@google.com> Subject: [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure From: Lisa Wang To: linmiaohe@huawei.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.or, linux-kselftest@vger.kernel.org Cc: david@redhat.com, rientjes@google.com, seanjc@google.com, ackerleytng@google.com, vannapurve@google.com, michael.roth@amd.com, jiaqiyan@google.com, tabba@google.com, dave.hansen@linux.intel.com, Lisa Wang Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The .error_remove_folio a_ops is used by different filesystems to handle folio truncation upon discovery of a memory failure in the memory associated with the given folio. Currently, MF_DELAYED is treated as an error, causing "Failed to punch page" to be written to the console. MF_DELAYED is then relayed to the caller of truncat_error_folio() as MF_FAILED. This further causes memory_failure() to return -EBUSY, which then always causes a SIGBUS. This is also implies that regardless of whether the thread's memory corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a memory failure within guest_memfd memory will always cause a SIGBUS. Update truncate_error_folio() to return MF_DELAYED to the caller if the .error_remove_folio() callback reports MF_DELAYED. Generalize the comment: MF_DELAYED means memory failure was handled and some other part of memory failure will be handled later (e.g. a next access will result in the process being killed). Specifically for guest_memfd, a next access by the guest will result in an error returned to the userspace VMM. With delayed handling, the filemap continues to hold refcounts on the folio. Hence, take that into account when checking for extra refcounts in me_pagecache_clean(). This is aligned with the implementation in me_swapcache_dirty(), where, if a folio is still in the swap cache, extra_pins is set to true. Signed-off-by: Lisa Wang --- mm/memory-failure.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index df6ee59527dd..77f665c16a73 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -922,9 +922,11 @@ static int kill_accessing_process(struct task_struct *= p, unsigned long pfn, * by the m-f() handler immediately. * * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed. - * The page is unmapped, and is removed from the LRU or file mapping. - * An attempt to access the page again will trigger page fault and the - * PF handler will kill the process. + * It means memory_failure was handled (e.g. removed from file mapping or = the + * LRU) and some other part of memory failure will be handled later (e.g. a + * next access will result in the process being killed). Specifically for + * guest_memfd, a next access by the guest will result in an error returne= d to + * the userspace VMM. * * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed. * The page has been completely isolated, that is, unmapped, taken out of @@ -999,6 +1001,9 @@ static int truncate_error_folio(struct folio *folio, u= nsigned long pfn, if (mapping->a_ops->error_remove_folio) { int err =3D mapping->a_ops->error_remove_folio(mapping, folio); =20 + if (err =3D=3D MF_DELAYED) + return err; + if (err !=3D 0) pr_info("%#lx: Failed to punch page: %d\n", pfn, err); else if (!filemap_release_folio(folio, GFP_NOIO)) @@ -1108,18 +1113,19 @@ static int me_pagecache_clean(struct page_state *ps= , struct page *p) goto out; } =20 - /* - * The shmem page is kept in page cache instead of truncating - * so is expected to have an extra refcount after error-handling. - */ - extra_pins =3D shmem_mapping(mapping); - /* * Truncation is a bit tricky. Enable it per file system for now. * * Open: to take i_rwsem or not for this? Right now we don't. */ ret =3D truncate_error_folio(folio, page_to_pfn(p), mapping); + + /* + * The shmem page, or any page with MF_DELAYED error handling, is kept in + * page cache instead of truncating, so is expected to have an extra + * refcount after error-handling. + */ + extra_pins =3D shmem_mapping(mapping) || ret =3D=3D MF_DELAYED; if (has_extra_refcount(ps, p, extra_pins)) ret =3D MF_FAILED; =20 --=20 2.51.0.788.g6d19910ace-goog From nobody Fri Dec 19 20:50:59 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32B9A27A135 for ; Wed, 15 Oct 2025 18:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553424; cv=none; b=T7xqNhjvRxAQUUAO/L5BMTadd/tCQXRrx9WWsXoP1WM2C4H5f5c8XtcVbE5/kQdimjG3/W8VRnmUjEtAJpkRdQN42mWcbEPlfWvUuTzBL1gzn7TCcelfI+rOP70K0EKbATEoATiqlTKyzQVQgx1g5STYddQGY16wOEI5BtcorKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553424; c=relaxed/simple; bh=s+zsTwKD/SG5cJlWDm5wtx2rbuyWZfGs6xCG4OUXHuc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=L53vVsjhOJfonY9B7sXoTKfvGqVvHauHX3Ulb/GJKhsIMnrpc1rtPTI5oVjVM+gDPOV1TB1CDZbH7QqLyKsVzHZhH0pRXNclaM1KnFJZ0NNY7DqiDoBPNZeeFcRqoscqXPJFLALmLZmEfwpDmy9uEOuw5LxOrtctM1oR/BO9RHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=R44tV9gH; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="R44tV9gH" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-7810af03a63so18793313b3a.3 for ; Wed, 15 Oct 2025 11:37:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760553421; x=1761158221; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pNUm5TxoGqq4kCj6yXsKowaihB2Pt1TXPKBWeL6tI2k=; b=R44tV9gHQXsjsmUVCdZ5UxAHVtHmxJl/KzRUF/3XIEip0TyhThnqBGBmGbTdiU6O/w fBK9KnXQ8qX36d8kX2v3a+9SePJGRpFaqtNcTcNNaibQCyRxaHgwoTh9kmuf6yudT7AQ J4ZBORwLdIqadPhhaX2727RkGZLmetZAAh3Jepalo8Gl50e4oObD6QnzZyAOi/jaHHFK 3yDUDUdFLmrEpymNK6VOLGp/hTJWxCwSXRr8Sejx5TzFqrVbdQ0heiU8QZNTt4zDQlI8 wnY+TZMMfFmaq4hkdQbh2swqVVU9GBntZU5nnibB+jXqBcH3j8ZYEa1CBoC5z2H16Phw Q/EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760553421; x=1761158221; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pNUm5TxoGqq4kCj6yXsKowaihB2Pt1TXPKBWeL6tI2k=; b=Cvvh1DBN2Hx0Y0zG+huQ8PlBX4pnz6lqwFZdoYYV0ZBlKJvtNpTKwvxnP04+rgBLcH +SEqNKpvcGNe17huHVykz8Mm0awd8rsV0VTy+aXI48NhbDhZYzHx68hMZi0OVXom28l+ tW5t/U5vdDfAFsMzlOFfPDMa6NdYOmrwv/3TVe/968av6BIr9vdDCOMUgfd3fTwJUQGw ZbEkSFVF1xLhHrxW/jbLF775GYNA0vo4iWcq8PzvqwxoNgy7k/DWEtlQYkp8m5jq29dP Dz7AjZYVqUDW8lvhhKuqbuU66n4KyiutBgLACB6t2g1H8nbY1nsrnLNMu0pG5sazZxQb +DwA== X-Forwarded-Encrypted: i=1; AJvYcCXJSDwPnDel1jrh/3mnvOOLYHq2Cv0YCTHHpaHTOwVHTBv7QSrM6qJSiuWrHcQm05Za9v+VcOZlRsmghGg=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3PoL3/N+/gLPdvh/ADDb8g1BzFY30mmPnIWVo4pOcGxdPB9X8 65/1tt1QVfHe5XTMZCZrMlA2ak09DBKhwRDByeFJCum8E/0J765PVkG2Y/4XEcvsr7arCsPDCJL VlyPyNg== X-Google-Smtp-Source: AGHT+IHLdKy38sFu9TnafNft1eGkPjEu7P3eYK+dKeIOsoTjhzPq3NWVPBrize9lj05WI9yvd58sUurqB5M= X-Received: from pgww4.prod.google.com ([2002:a05:6a02:2c84:b0:b55:1380:417c]) (user=wyihan job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:7289:b0:2fd:a3b:9349 with SMTP id adf61e73a8af0-32da84e8482mr38825805637.57.1760553421530; Wed, 15 Oct 2025 11:37:01 -0700 (PDT) Date: Wed, 15 Oct 2025 18:35:53 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.51.0.788.g6d19910ace-goog Message-ID: Subject: [RFC PATCH 2/3] KVM: selftests: Add memory failure tests in guest_memfd_test From: Lisa Wang To: linmiaohe@huawei.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.or, linux-kselftest@vger.kernel.org Cc: david@redhat.com, rientjes@google.com, seanjc@google.com, ackerleytng@google.com, vannapurve@google.com, michael.roth@amd.com, jiaqiyan@google.com, tabba@google.com, dave.hansen@linux.intel.com, Lisa Wang Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" After modifying truncate_error_folio(), we expect memory_failure() will return 0 instead of MF_FAILED. Also, we want to make sure memory_failure() signaling function is same. Test that memory_failure() returns 0 for guest_memfd, where .error_remove_folio() is handled by not actually truncating, and returning MF_DELAYED. In addition, test that SIGBUS signaling behavior is not changed before and after this modification. There are two kinds of guest memory failure injections - madvise or debugfs. When memory failure is injected using madvise, the MF_ACTION_REQUIRED flag is set, and the page is mapped and dirty, the process should get a SIGBUS. When memory is failure is injected using debugfs, the KILL_EARLY machine check memory corruption kill policy is set, and the page is mapped and dirty, the process should get a SIGBUS. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Signed-off-by: Lisa Wang --- .../testing/selftests/kvm/guest_memfd_test.c | 168 ++++++++++++++++++ 1 file changed, 168 insertions(+) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index e7d9aeb418d3..7bcf8d2d5d4d 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -10,6 +10,8 @@ #include #include #include +#include +#include =20 #include #include @@ -97,6 +99,171 @@ static void test_fault_overflow(int fd, size_t total_si= ze) test_fault_sigbus(fd, total_size, total_size * 4); } =20 +static unsigned long addr_to_pfn(void *addr) +{ + const uint64_t pagemap_pfn_mask =3D BIT(54) - 1; + const uint64_t pagemap_page_present =3D BIT(63); + uint64_t page_info; + ssize_t n_bytes; + int pagemap_fd; + + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + TEST_ASSERT(pagemap_fd > 0, "Opening pagemap should succeed."); + + n_bytes =3D pread(pagemap_fd, &page_info, 8, (uint64_t)addr / page_size *= 8); + TEST_ASSERT(n_bytes =3D=3D 8, "pread of pagemap failed. n_bytes=3D%ld", n= _bytes); + + close(pagemap_fd); + + TEST_ASSERT(page_info & pagemap_page_present, "The page for addr should b= e present"); + return page_info & pagemap_pfn_mask; +} + +static void write_memory_failure(unsigned long pfn, bool mark, int return_= code) +{ + char path[PATH_MAX]; + char *filename; + char buf[20]; + int ret; + int len; + int fd; + + filename =3D mark ? "corrupt-pfn" : "unpoison-pfn"; + snprintf(path, PATH_MAX, "/sys/kernel/debug/hwpoison/%s", filename); + + fd =3D open(path, O_WRONLY); + TEST_ASSERT(fd > 0, "Failed to open %s.", path); + + len =3D snprintf(buf, sizeof(buf), "0x%lx\n", pfn); + if (len < 0 || (unsigned int)len > sizeof(buf)) + TEST_ASSERT(0, "snprintf failed or truncated."); + + ret =3D write(fd, buf, len); + if (return_code =3D=3D 0) { + /* + * If the memory_failure() returns 0, write() should be successful, + * which returns how many bytes it writes. + */ + TEST_ASSERT(ret > 0, "Writing memory failure (path: %s) failed: %s", pat= h, + strerror(errno)); + } else { + TEST_ASSERT_EQ(ret, -1); + /* errno is memory_failure() return code. */ + TEST_ASSERT_EQ(errno, return_code); + } + + close(fd); +} + +static void mark_memory_failure(unsigned long pfn, int return_code) +{ + write_memory_failure(pfn, true, return_code); +} + +static void unmark_memory_failure(unsigned long pfn, int return_code) +{ + write_memory_failure(pfn, false, return_code); +} + +enum memory_failure_injection_method { + MF_INJECT_DEBUGFS, + MF_INJECT_MADVISE, +}; + +static void do_test_memory_failure(int fd, size_t total_size, + enum memory_failure_injection_method method, int kill_config, + bool map_page, bool dirty_page, bool sigbus_expected, + int return_code) +{ + unsigned long memory_failure_pfn; + char *memory_failure_addr; + char *mem; + int ret; + + mem =3D mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem !=3D MAP_FAILED, "mmap() for guest_memfd should succeed."= ); + memory_failure_addr =3D mem + page_size; + if (dirty_page) + *memory_failure_addr =3D 'A'; + else + READ_ONCE(*memory_failure_addr); + + /* Fault in page to read pfn, then unmap page for testing if needed. */ + memory_failure_pfn =3D addr_to_pfn(memory_failure_addr); + if (!map_page) + madvise(memory_failure_addr, page_size, MADV_DONTNEED); + + ret =3D prctl(PR_MCE_KILL, PR_MCE_KILL_SET, kill_config, 0, 0); + TEST_ASSERT_EQ(ret, 0); + + ret =3D 0; + switch (method) { + case MF_INJECT_DEBUGFS: { + /* DEBUGFS injection handles return_code test inside the mark_memory_fai= lure(). */ + if (sigbus_expected) + TEST_EXPECT_SIGBUS(mark_memory_failure(memory_failure_pfn, return_code)= ); + else + mark_memory_failure(memory_failure_pfn, return_code); + break; + } + case MF_INJECT_MADVISE: { + /* + * MADV_HWPOISON uses get_user_pages() so the page will always + * be faulted in at the point of memory_failure() + */ + if (sigbus_expected) + TEST_EXPECT_SIGBUS(ret =3D madvise(memory_failure_addr, + page_size, MADV_HWPOISON)); + else + ret =3D madvise(memory_failure_addr, page_size, MADV_HWPOISON); + + if (return_code =3D=3D 0) + TEST_ASSERT(ret =3D=3D return_code, "Memory failure failed. Errno: %s", + strerror(errno)); + else { + /* errno is memory_failure() return code. */ + TEST_ASSERT_EQ(errno, return_code); + } + break; + } + default: + TEST_FAIL("Unhandled memory failure injection method %d.", method); + } + + TEST_EXPECT_SIGBUS(READ_ONCE(*memory_failure_addr)); + TEST_EXPECT_SIGBUS(*memory_failure_addr =3D 'A'); + + ret =3D munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap() should succeed."); + + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, + total_size); + TEST_ASSERT(!ret, "Truncate the entire file (cleanup) should succeed."); + + ret =3D prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_DEFAULT, 0, 0); + TEST_ASSERT_EQ(ret, 0); + + unmark_memory_failure(memory_failure_pfn, 0); +} + +static void test_memory_failure(int fd, size_t total_size) +{ + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EAR= LY, true, true, true, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EAR= LY, true, false, false, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EAR= LY, false, true, false, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LAT= E, true, true, false, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LAT= E, true, false, false, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LAT= E, false, true, false, 0); + /* + * If madvise() is used to inject errors, memory_failure() handling is in= voked with the + * MF_ACTION_REQUIRED flag set, aligned with memory failure handling for = a consumed memory + * error, where the machine check memory corruption kill policy is ignore= d. Hence, testing with + * PR_MCE_KILL_DEFAULT covers all cases. + */ + do_test_memory_failure(fd, total_size, MF_INJECT_MADVISE, PR_MCE_KILL_DEF= AULT, true, true, true, 0); + do_test_memory_failure(fd, total_size, MF_INJECT_MADVISE, PR_MCE_KILL_DEF= AULT, true, false, false, 0); +} + static void test_fault_private(int fd, size_t total_size) { test_fault_sigbus(fd, 0, total_size); @@ -273,6 +440,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint6= 4_t flags) if (flags & GUEST_MEMFD_FLAG_INIT_SHARED) { gmem_test(mmap_supported, vm, flags); gmem_test(fault_overflow, vm, flags); + gmem_test(memory_failure, vm, flags); } else { gmem_test(fault_private, vm, flags); } --=20 2.51.0.788.g6d19910ace-goog From nobody Fri Dec 19 20:50:59 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 092B43002D3 for ; Wed, 15 Oct 2025 18:37:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553428; cv=none; b=LkAqb2t68qTxkyRaOC402se6N5FEOE+j/9w+RGbxx8WCc04POh0kMWCRmctWL5FuUKd8dafB/ojXhtM6VrhxqHBV4kuHvuaMUUDpKPvkXPvv86muj0OfG47ioPt3ZCHngYC9jardQDZ4CC/ad6NFiKz7d00q2o5JepPFWUBU2x0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760553428; c=relaxed/simple; bh=1H/TAjStyaI6entLwQUNBGgnkdgglmJ5EvafY9K9xl4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LVHy4ab5nKxsfmudLI+ol4rsaqmMAzuKrx1mIJCmy1UMtTQ8/bLrd1LRIvBQhkBhezoUGKNRnwjz0W3XWv+FMlkCq674t3CaYOnB3Ei5nIHBkxzriCjwq1bQv/JAj0tkQ9VmG0YsTUML8DqS9PryBXZ8f6/Gs2WXwniuDgqZEes= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=vDSc+bur; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--wyihan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vDSc+bur" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b6325a95e44so1155711a12.0 for ; Wed, 15 Oct 2025 11:37:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760553426; x=1761158226; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=r/50hzPJcz2Tgtv5Q95r1Wz9trycmcqvXu8eukDbYxw=; b=vDSc+burVVff7zgcEyAcpSyPTgFBVU+oIVMjEidkDB3m7fBUVM9Su+4+MhP9MGjS6T goS5Njp3SjoQEKqfXKpyDst0mgYblI7fOaR+o0YrTs4GVBcf3ujpOV3qxdSylJKmaih3 pELznx9N+Qx7rOal4ykkbi1rJVZqnezfZL9ttF68UKU20zcadl6KMBElZCS0DL12FaR1 2mu6R8j8RjDKOcZyJW8plJHz024EXOe3gvdAZY4FMCEJHzlJmMMEpjzGGXaVYP/x+A7V IRwAhylJPuvsV+CSJ+6y6wlUa3zdvz47JRofiHlDDawUYvQONyc6G7vHiOM5iYtAJG7B tkzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760553426; x=1761158226; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=r/50hzPJcz2Tgtv5Q95r1Wz9trycmcqvXu8eukDbYxw=; b=mx7SF2h5sWSWN4gbkpEoQ+KVJPz0kBbNmBg0Mcm8oKgqIaBn6GmNHELLcHNonG6HAy 78Eh2GtqCu+VNkfHcPl42Y4DBGVLkV0LHxucpJb9KogFxMhLZBRcgJ/Y4JRzRVNg5n6C RblhAOvvd6P7UlwF6xPau9okDNZlnKeVaXA+JxHVgKqnDb1cACiRE3aAKj8akupo722W ij5pt7Fo7rqFo5eS2n7t3fga8fvA8lcQlFGZk9+OO/6f/+w6JBJ3uTcVcyqrEGf0D/u+ YlORSNImpcunYaKw0cBcDCtNSM5oP4le/sQJ3QK2JS7HhQTovXLEggCUNG7mHQUyi3xl W/ug== X-Forwarded-Encrypted: i=1; AJvYcCWAQ/zVjo+YE2O7+9tTlSX6sfqt27ZCE35Gs14SrrKdoMwh3Vtpk1CFijccuoOQYEhOP9FYShR9Rht8AeA=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6Tt5wULJwvw7RrppEP7RNckbeUe751UppKzpBx5i/W7sJhBGm zGbtO0TDmjLw9q25umY3HtnpOLkwOdk2eEg4LlIE/58hjOFZYOvotHJfwInsckcY8NowkILsiwb GMkUZKQ== X-Google-Smtp-Source: AGHT+IH/6fHMrWMV8yp+8Ppnbbm83SfdopptQSAUB91QvYTFNTxXFVFSpf5U8i3i7fNPRxzlbwQGpMJ/fbA= X-Received: from pfbei17.prod.google.com ([2002:a05:6a00:80d1:b0:772:3537:d602]) (user=wyihan job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:1593:b0:2f3:3e1b:de66 with SMTP id adf61e73a8af0-33495d4ea9cmr1351247637.7.1760553426186; Wed, 15 Oct 2025 11:37:06 -0700 (PDT) Date: Wed, 15 Oct 2025 18:35:54 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.51.0.788.g6d19910ace-goog Message-ID: <0ec7349858142439ed0a250e6c04edf84cb0f488.1760551864.git.wyihan@google.com> Subject: [RFC PATCH 3/3] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables From: Lisa Wang To: linmiaohe@huawei.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.or, linux-kselftest@vger.kernel.org Cc: david@redhat.com, rientjes@google.com, seanjc@google.com, ackerleytng@google.com, vannapurve@google.com, michael.roth@amd.com, jiaqiyan@google.com, tabba@google.com, dave.hansen@linux.intel.com, Lisa Wang Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test that + memory failure handling results in unmapping of bad memory from stage 2 page tables, hence requiring faulting on next guest access + when the guest tries to fault a poisoned page from guest_memfd, the userspace VMM informed with EHWPOISON Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Signed-off-by: Lisa Wang --- .../testing/selftests/kvm/guest_memfd_test.c | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index 7bcf8d2d5d4d..dc3398e22edd 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -539,6 +539,70 @@ static void test_guest_memfd_guest(void) kvm_vm_free(vm); } =20 +static void __guest_code_read(uint8_t *mem) +{ + READ_ONCE(*mem); + GUEST_DONE(); +} + +static void guest_read(struct kvm_vcpu *vcpu, uint64_t gpa, int expected_e= rrno) +{ + vcpu_arch_set_entry_point(vcpu, __guest_code_read); + vcpu_args_set(vcpu, 1, gpa); + + if (expected_errno) { + TEST_ASSERT_EQ(_vcpu_run(vcpu), -1); + TEST_ASSERT_EQ(errno, expected_errno); + } else { + vcpu_run(vcpu); + TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE); + } +} + +static void test_memory_failure_guest(void) +{ + const uint64_t gpa =3D SZ_4G; + const int slot =3D 1; + + unsigned long memory_failure_pfn; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + uint8_t *mem; + size_t size; + int fd; + + if (!kvm_has_cap(KVM_CAP_GUEST_MEMFD_FLAGS)) + return; + + vm =3D __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, __gues= t_code_read); + + size =3D vm->page_size; + fd =3D vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP | GUEST_MEMF= D_FLAG_INIT_SHARED); + vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL= , fd, 0); + + mem =3D mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem !=3D MAP_FAILED, "mmap() for guest_memfd should succeed."= ); + virt_pg_map(vm, gpa, gpa); + + /* Fault in page to read pfn, then unmap page for testing. */ + READ_ONCE(*mem); + memory_failure_pfn =3D addr_to_pfn(mem); + munmap(mem, size); + + /* Fault page into stage2 page tables. */ + guest_read(vcpu, gpa, 0); + + mark_memory_failure(memory_failure_pfn, 0); + + guest_read(vcpu, gpa, EHWPOISON); + munmap(mem, size); + + close(fd); + kvm_vm_free(vm); + + unmark_memory_failure(memory_failure_pfn, 0); +} + int main(int argc, char *argv[]) { unsigned long vm_types, vm_type; @@ -559,4 +623,5 @@ int main(int argc, char *argv[]) test_guest_memfd(vm_type); =20 test_guest_memfd_guest(); + test_memory_failure_guest(); } --=20 2.51.0.788.g6d19910ace-goog