From nobody Mon May 25 09:56:50 2026 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 201A83E8337 for ; Mon, 18 May 2026 09:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097006; cv=none; b=YYWVqLqoG1GYcx4jYLLJ93GyM0N7kIDwV1J39UO+ePc9lssX3YII+C70ZqwLiAO//dVSufI09uHU5+1HdyKxitcnqVRXAGUfKMkFxqCmq1zWy9Gd+h09k7otFc8wcszGYTya+Mgo++lXC+8nbQV5OdKg6XB2+ex48hvM2XjZHLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097006; c=relaxed/simple; bh=L4H1uHql/HJPpc27eILZG0D33tmdJ49HmIaLmZtdcjk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Mt8Z74YakT8v+FUapz07kS6mz8+hZBDGBmO6ZY+18romu0gcgbsQp7KeOqSWY5yu7EZUzmzXWjo636DFpxbl2rT2yDLA18ps+vO+gSrD60PAycQEKqkmFbpiDEnbXR+kE2CwQS3vOdcOBt5ZShnGI3ZMGtpVr/+MNliJvivoX/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DBq8N28W; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DBq8N28W" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-67fd8b662ecso2126258a12.2 for ; Mon, 18 May 2026 02:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097003; x=1779701803; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=as26CSLWhv9riXhwc0+6uzjiz4k1n6M2f5huVsJsy4s=; b=DBq8N28We2W6eEsbAqjHuhdrB4CKMcnVQqNw9tpkMdrIw+nYTFJRC+48DpUwFUQ+6F PbcV8x2lt7pYkC0tZUU0XG+aidIml1U4bBjIt2fUU2mnG78QJnYe/tpd54fIWkVtOKbT yOvc1EoL7Ep8MR8UP0AJFJhoKMgHjRxYfUEcycdM9vkErWsIKppGg3EFhtWn/PP5KmEt E601aNr9DmU+fC92hpdoPXZduJ0l2LQUA82rKDUX1nyyE1VSx4886aEAlA8Afef9JYGO gtpjHMmwJWrhXNesKmPU3YyUuMmjg2SqgMw+Wft9SuPF9xAexmtG70z+QI476bWaWqeb 7KlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097003; x=1779701803; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=as26CSLWhv9riXhwc0+6uzjiz4k1n6M2f5huVsJsy4s=; b=cTb5yx8FC8A9VB6Q6Di0d+J4+kYXj1zYyGpNPuV8B+ozAUUt2sNhlPXqBk1xQCdnhv w6+SHpwpwV8DgNR+wdubHDnn7LcGfirJrslz6QRKh42WZGZoGjHuvfbqJ9QeD8Q1Voh1 hKPHXKdS4bNKG5mnY8bXRyRuir8ierSIKSc3+fQhvQbJh/YTCc/Kz2pUW49bfUX1B5eu o2DRd2k4P8XzQ444KFffMsinO5xSkUP1DUsRnPGnisPj4p+7/9F4LZQpHa34ximA3ofj jaAa2iTKl0igdFjhVqNTJC7Lg9bosyo4CApvpdIrIFmHqdKN0q8lqXObkhaNLNjJ3Kf7 0H3A== X-Gm-Message-State: AOJu0Yzp6QZ2D04qATh+pyyMemKIEPwbXZS5/ikeIwqnnvCf++4Q8s1W ed6CyfOwKx3qzEbRRgVs1WBfFpievpC2t+CgrcS5AojNhiJ8KzzK80NLiz9Yetf3I4yGlLv9Pz7 vfWL620wHKyR0mbpYAQ== X-Received: from edxa5.prod.google.com ([2002:a05:6402:13c5:b0:683:d667:6d6c]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:24ce:b0:683:5306:d41b with SMTP id 4fb4d7f45d1cf-683bc5ba630mr6644571a12.7.1779097003267; Mon, 18 May 2026 02:36:43 -0700 (PDT) Date: Mon, 18 May 2026 09:36:31 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <9ff4237649ace8b7eb78f3c3ec9ffaf5ae49ae6a.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 1/9] liveupdate: luo_file: Add internal APIs for file preservation From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pasha Tatashin The core liveupdate mechanism allows userspace to preserve file descriptors. However, kernel subsystems often manage struct file objects directly and need to participate in the preservation process programmatically without relying solely on userspace interaction. Signed-off-by: Pasha Tatashin Signed-off-by: Samiullah Khawaja Signed-off-by: Tarun Sahu --- include/linux/liveupdate.h | 21 ++++++++++ kernel/liveupdate/luo_file.c | 69 ++++++++++++++++++++++++++++++++ kernel/liveupdate/luo_internal.h | 17 ++++++++ 3 files changed, 107 insertions(+) diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h index 30c5a39ff9e9..de052438eaac 100644 --- a/include/linux/liveupdate.h +++ b/include/linux/liveupdate.h @@ -24,6 +24,7 @@ struct file; /** * struct liveupdate_file_op_args - Arguments for file operation callbacks. * @handler: The file handler being called. + * @session: The session this file belongs to. * @retrieve_status: The retrieve status for the 'can_finish / finish' * operation. A value of 0 means the retrieve has not b= een * attempted, a positive value means the retrieve was @@ -44,6 +45,7 @@ struct file; */ struct liveupdate_file_op_args { struct liveupdate_file_handler *handler; + struct liveupdate_session *session; int retrieve_status; struct file *file; u64 serialized_data; @@ -240,6 +242,13 @@ void liveupdate_unregister_flb(struct liveupdate_file_= handler *fh, =20 int liveupdate_flb_get_incoming(struct liveupdate_flb *flb, void **objp); int liveupdate_flb_get_outgoing(struct liveupdate_flb *flb, void **objp); +/* kernel can internally retrieve files */ +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token, + struct file **filep); + +/* Get a token for an outgoing file, or -ENOENT if file is not preserved */ +int liveupdate_get_token_outgoing(struct liveupdate_session *s, + struct file *file, u64 *tokenp); =20 #else /* CONFIG_LIVEUPDATE */ =20 @@ -285,5 +294,17 @@ static inline int liveupdate_flb_get_outgoing(struct l= iveupdate_flb *flb, return -EOPNOTSUPP; } =20 +static inline int liveupdate_get_file_incoming(struct liveupdate_session *= s, + u64 token, struct file **filep) +{ + return -EOPNOTSUPP; +} + +static inline int liveupdate_get_token_outgoing(struct liveupdate_session = *s, + struct file *file, u64 *tokenp) +{ + return -EOPNOTSUPP; +} + #endif /* CONFIG_LIVEUPDATE */ #endif /* _LINUX_LIVEUPDATE_H */ diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c index a0a419085e28..0aa0b4e5339f 100644 --- a/kernel/liveupdate/luo_file.c +++ b/kernel/liveupdate/luo_file.c @@ -323,6 +323,7 @@ int luo_preserve_file(struct luo_file_set *file_set, u6= 4 token, int fd) mutex_init(&luo_file->mutex); =20 args.handler =3D fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D file; err =3D fh->ops->preserve(&args); if (err) @@ -380,6 +381,7 @@ void luo_file_unpreserve_files(struct luo_file_set *fil= e_set) struct luo_file, list); =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -411,6 +413,7 @@ static int luo_file_freeze_one(struct luo_file_set *fil= e_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -432,6 +435,7 @@ static void luo_file_unfreeze_one(struct luo_file_set *= file_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -621,6 +625,7 @@ int luo_retrieve_file(struct luo_file_set *file_set, u6= 4 token, } =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.serialized_data =3D luo_file->serialized_data; err =3D luo_file->fh->ops->retrieve(&args); if (err) { @@ -654,6 +659,7 @@ static int luo_file_can_finish_one(struct luo_file_set = *file_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.retrieve_status =3D luo_file->retrieve_status; @@ -671,6 +677,7 @@ static void luo_file_finish_one(struct luo_file_set *fi= le_set, guard(mutex)(&luo_file->mutex); =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.retrieve_status =3D luo_file->retrieve_status; @@ -924,3 +931,65 @@ void liveupdate_unregister_file_handler(struct liveupd= ate_file_handler *fh) luo_flb_unregister_all(fh); list_del(&ACCESS_PRIVATE(fh, list)); } +EXPORT_SYMBOL_GPL(liveupdate_unregister_file_handler); + +/** + * liveupdate_get_token_outgoing - Get the token for a preserved file. + * @s: The outgoing liveupdate session. + * @file: The file object to search for. + * @tokenp: Output parameter for the found token. + * + * Searches the list of preserved files in an outgoing session for a match= ing + * file object. If found, the corresponding user-provided token is returne= d. + * + * This function is intended for in-kernel callers that need to correlate a + * file with its liveupdate token. + * + * Context: It must be called with session mutex acquired. + * Return: 0 on success, -ENOENT if the file is not preserved in this sess= ion. + */ +int liveupdate_get_token_outgoing(struct liveupdate_session *s, + struct file *file, u64 *tokenp) +{ + struct luo_file_set *file_set =3D luo_file_set_from_session_locked(s); + struct luo_file *luo_file; + int err =3D -ENOENT; + + list_for_each_entry(luo_file, &file_set->files_list, list) { + if (luo_file->file =3D=3D file) { + if (tokenp) + *tokenp =3D luo_file->token; + err =3D 0; + break; + } + } + + return err; +} + +/** + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel= use. + * @s: The incoming liveupdate session (restored from the previous ke= rnel). + * @token: The unique token identifying the file to retrieve. + * @filep: On success, this will be populated with a pointer to the retri= eved + * 'struct file'. + * + * Provides a kernel-internal API for other subsystems to retrieve their + * preserved files after a live update. This function is a simple wrapper + * around luo_retrieve_file(), allowing callers to find a file by its toke= n. + * + * The caller receives a new reference to the file and must call fput() wh= en it + * is no longer needed. The file's lifetime is managed by LUO and any user= space + * file descriptors. If the caller needs to hold a reference to the file b= eyond + * the immediate scope, it must call get_file() itself. + * + * Context: It must be called with session mutex acquired of a restored se= ssion. + * Return: 0 on success. Returns -ENOENT if no file with the matching toke= n is + * found, or any other negative errno on failure. + */ +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token, + struct file **filep) +{ + return luo_retrieve_file(luo_file_set_from_session_locked(s), + token, filep); +} diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_inter= nal.h index 875844d7a41d..08b198802e7f 100644 --- a/kernel/liveupdate/luo_internal.h +++ b/kernel/liveupdate/luo_internal.h @@ -79,6 +79,23 @@ struct luo_session { =20 extern struct rw_semaphore luo_register_rwlock; =20 +static inline struct liveupdate_session *luo_session_from_file_set(struct = luo_file_set *file_set) +{ + struct luo_session *session; + + session =3D container_of(file_set, struct luo_session, file_set); + + return (struct liveupdate_session *)session; +} + +static inline struct luo_file_set *luo_file_set_from_session_locked(struct= liveupdate_session *s) +{ + struct luo_session *session =3D (struct luo_session *)s; + + lockdep_assert_held(&session->mutex); + return &session->file_set; +} + int luo_session_create(const char *name, struct file **filep); int luo_session_retrieve(const char *name, struct file **filep); int __init luo_session_setup_outgoing(void *fdt); --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 212773E7BA1 for ; Mon, 18 May 2026 09:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097007; cv=none; b=NvKKmGmPyVLjG3DUyaE3rX1gQhps88AnhZCj9nZxtKvoyL5c8se7eLDX7b+VPlbF06ziNM4hR3Ii2OZDGMBdeCqo2GNpQjnuv5REg+xNFDV3gRhI0YS5RzWOE8aQki024xjRSh2ZPIpF0EsaKm4BmkqaBQgLIPoQYpFpRAV83O4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097007; c=relaxed/simple; bh=Z0VMlnKWrIbM8ucdcV8fo/9qaRMgskvqDrKATZ6/SUw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tEdPNc+4B3yJNCpHji+Y5Af/xpfjv9MPJNtrA9+yAj9tjnTHjnhbrbbnNIFvqDKmRZ84T9uV+s0vHf6wqLsOyLSCJ5Q18uTTESIBpm3vqzmXdB9TYdpw62CBJflqMUDWY0jwuZ/9d9FlbRpE4k+EjGFx6gbn5UbXUwSiDMdC/gM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Dk1plU++; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Dk1plU++" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-44d9ace59efso1347180f8f.1 for ; Mon, 18 May 2026 02:36:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097004; x=1779701804; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jv7htWRaexdWX7r0/O/5aQjJsBEQgyvtoCI5ugM9CJs=; b=Dk1plU++yg4+P+u+713qMALJiIGcGGrMwdrvNKZwMdgdf2XNSk1rXhGQI3TAswtCu6 dHw00ioXI/vlXvkfRl7GuUQ1AJvWgVDqzyPW4Q+563mK3pMvoOWKwu1oxJ3YkuQYnJV4 bCH4Xwg2HeWVdags8blT0bdiPamK4xvA/BHw9IGXKp9seV45pILXJiIn3eLp/Z9SCYpf I+ZVTy6BkwLjzKKJ+GWPmLA1XBRYOEN2aemMYTRPf0QN3ZYwuK43n5CrNxbLY4CiKiE/ lWM44OT3g1DwAgDiVKRl0dkveV/qg1qXdNTKzBxrA0dPtq7ULW7j2Fy9C7gJBMkSTKZj hOQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097004; x=1779701804; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jv7htWRaexdWX7r0/O/5aQjJsBEQgyvtoCI5ugM9CJs=; b=Yl4N4jZ99rGMFi7dVeQyrl2txziwGQbshyWLWlrcGODUzbSWA+XbajT9sdr1QpwD6T iPlY3iw1lqmhtAvRNnlDJxnlbfhpR1bT7USqrAfsngAPlsTBF2oPWwqLLdOHR/8E4zlk C9Iw7DfPq5YYYMbTg1I1dzMk1YzSUOyTGbdtE2JdavlaDpptKrlhKiSTrRLEc4kYVMXf zplxvQcQJtQjhYJ3UkeZ7cq6L3Z00VkuezIDRpjcXP26eBG+TnLOXvap12Tlo5Cec4KY hmueFtBP0s77IcJzzFzvHZtM9N+jgSX83+HTiE+FjyeQA8qgYzRStI3pyOrEiFtxtHWb eBBA== X-Gm-Message-State: AOJu0Yzxcz1X8rOyuScC5iSyyAq+5D0JCkB5275az22J8eL8Zv/OektH XXJqtM9O7+S5APaPTW2/JMZBoitXY1TFvfeizXi1gHHzMqFbObUX8yePQFlzUPjpLMvy2iYC6qN Mw0TAUx0lCzncrLuAKA== X-Received: from wrmj10.prod.google.com ([2002:adf:e50a:0:b0:44c:111d:990b]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:8b6e:b0:48a:6fd4:d3d3 with SMTP id 5b1f17b1804b1-48fe61ed21amr229120015e9.20.1779097004324; Mon, 18 May 2026 02:36:44 -0700 (PDT) Date: Mon, 18 May 2026 09:36:32 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <63776195da42f3307cfa0ad9ee7aa5504e4c2cb1.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 2/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce the LIVEUPDATE_GUEST_MEMFD Kconfig option. This option enables live update support for KVM guest_memfd files, enabling guest_memfd-backed memory preservation across kernel upgrades. Currently this support only guest_memfd files that are full-shared and pre-faulted. Signed-off-by: Tarun Sahu --- kernel/liveupdate/Kconfig | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig index 1a8513f16ef7..0bbc4037192e 100644 --- a/kernel/liveupdate/Kconfig +++ b/kernel/liveupdate/Kconfig @@ -88,4 +88,19 @@ config LIVEUPDATE_MEMFD =20 If unsure, say N. =20 +config LIVEUPDATE_GUEST_MEMFD + bool "Live update support for guest_memfd" + depends on LIVEUPDATE + depends on KVM_GUEST_MEMFD + default LIVEUPDATE + help + Enable live update support for KVM guest_memfd files. This allows + preserving VM Memory backed by guest_memfd file across kernel live + updates. + + This can only be used for the guest_memfd that are fully-shared + and pre-faulted. + + If unsure, say N. + endmenu --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2B863E833A for ; Mon, 18 May 2026 09:36:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097008; cv=none; b=Yvxa+pcPYz2SwFI60iuIMTEfrJHI+EX4G604Qh1FKmDROvju3I64HQAmd2P72nOdStcX4QOOO0Yi3p3pgwgQuC56Z7N0x4edigedBGQY/cFaf6WMRVpxjRD1YVTiZNzGYGuCHKEIC6A+WkNFXhlVA2jCbDb8H1ktC1PbfZbd4FY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097008; c=relaxed/simple; bh=V8f3kDEJqJZh7MK3ij24Zg4qD5ZSK/hWmoGv8D34y/E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mk/dn45sUZwq8vglfzhL38FcaTkLavQpn2SvOo1TGU5ymO+5e+VZSXeMbTbf+yX1dOlO8y0Tbn+OiWjmENw9AjmkONU6mfYsTgbIkvFZokQMhFSb5yEmnfuvtkhEkyQcvD7rLPczyJR52mf4PbgIufS74EIhL2jbRZdE9bF4AOc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=n5ZjyEsQ; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n5ZjyEsQ" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-bcdbb46dce2so178752866b.1 for ; Mon, 18 May 2026 02:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097005; x=1779701805; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Gu2thktJGs003PTp9eiJKgCeu3f066a/QrblnMWAnz8=; b=n5ZjyEsQRRnUm8aBaTesVGsuzMtGQKFzf5msV3pCOkL4QaYZJ4qZpQFQa/tB0r+yZr IMJkjxvmy2r5n7SQ9U6qtVahUKHjzkL2mE8RGW0cy5GTVUpGlhb4NMwhkqQEzb1P4axq o8alTK1hJwdQ0wF5mDa60D3kRP6Icpf8kN1gGHvAz7aK+q7NCcSuqTgLpvjt0cuOK47H zykUih7BYSd4cGyghJ/SApMW5+sgb825m+4MGAb6O6eB+yP9DT/yF0icFasAPlYwF3MH 58/TQtTuk5O/RoT8gvqvfP5WycTXBu4DmuB5Skjn1KyqLlHBLAk3rkxU54TbPTcl9oVr OsrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097005; x=1779701805; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Gu2thktJGs003PTp9eiJKgCeu3f066a/QrblnMWAnz8=; b=sjaMQbvtl/V3pFnjKfe/ByjqYDgHJV3MSlpIMBFPvLybCzmKkI9fmrKmjD/jcRvm/+ kjPQa0/OXkkDefrJnP4eE3/UULx7TMK81OMpDHGpgDUVeqTgJmR4VpVMO5Y+qNPGyE0B agg6dX5uiEikzXzQM0MsXvEj6dZ+R+iosh8uDUlY73EMnVMcHFIyyYPPRSEjCzw2QfRR edw9mSJc5z4vwqmnQurxtFBJnSgPdxvAqoettIyBSUx6pU4KXSkl4G6Fz4fEDlnfBlKt /HqzlAcrSkAL3P1Hiy4B2y63L794PVjbuxsGMnkFhKAJcBShOso/MK+eSVxC7wiBt4Zd jQ+A== X-Gm-Message-State: AOJu0YwmEGQNN3FqMsUx5hjDIS+p2oj8bau5WqjZfHpN1BnRVHVMJFI2 TjLnhW+0yWK2pnH8+42o7YW8nYi1576/iAakEXJ4BRnTtY2savjQhk69UR3CTBNMKoBOrFmsywO RWqn0hPFJ5WfdzHVKwg== X-Received: from edbck27.prod.google.com ([2002:a05:6402:1c1b:b0:672:bd1d:a465]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:906:8e0e:b0:bae:d29c:4e28 with SMTP id a640c23a62f3a-bd51785f36amr758368766b.12.1779097005099; Mon, 18 May 2026 02:36:45 -0700 (PDT) Date: Mon, 18 May 2026 09:36:33 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <24e13c0679157302f0d8385c341169b0ddd4ea40.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 3/9] kvm: Prepare core VM structs and helpers for LUO support From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce core infrastructure to support VM preservation with LUO. First two changes are just refactoring, no functional change, third change introduces a new member in struct kvm. - Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code. - Add a public kvm_create_vm_file() helper wrapping kvm_create_vm() and anon_inode_getfile() to provide a unified VM file creation API. - Track a weak reference to the backing file in struct kvm under CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution without circular lifetime dependencies. Signed-off-by: Tarun Sahu --- include/linux/kvm_host.h | 14 +++++++ virt/kvm/kvm_main.c | 79 +++++++++++++++++++++++++++++----------- virt/kvm/kvm_mm.h | 3 ++ 3 files changed, 75 insertions(+), 21 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4c14aee1fb06..9111a28637af 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -874,6 +874,18 @@ struct kvm { #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES /* Protected by slots_lock (for writes) and RCU (for reads) */ struct xarray mem_attr_array; +#endif +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD + /* + * Weak reference to the VFS file backing this KVM instance. Stored + * without incrementing the file refcount to prevent a circular lifetime + * dependency (since file->private_data already pins this struct kvm). + * Used exclusively to resolve the file pointer back from struct kvm. + * + * Written/cleared via rcu_assign_pointer() and read locklessly under + * RCU (e.g. via get_file_active() to prevent ABA races). + */ + struct file *vm_file; #endif char stats_id[KVM_STATS_NAME_SIZE]; }; @@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm); bool kvm_get_kvm_safe(struct kvm *kvm); void kvm_put_kvm(struct kvm *kvm); bool file_is_kvm(struct file *file); +struct file *kvm_create_vm_file(unsigned long type, const char *fdname); void kvm_put_kvm_no_destroy(struct kvm *kvm); +void kvm_uevent_notify_vm_create(struct kvm *kvm); =20 static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_= id) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 89489996fbc1..65f0c5fb353e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -67,9 +67,6 @@ #include =20 =20 -/* Worst case buffer size needed for holding an integer. */ -#define ITOA_MAX_LEN 12 - MODULE_AUTHOR("Qumranet"); MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor"); MODULE_LICENSE("GPL"); @@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struc= t file *filp) { struct kvm *kvm =3D filp->private_data; =20 +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD + /* + * Clear the weak reference of the vm file. + * In case vm file is closed by userspace, but kvm still has + * other users like vCPUs, clearing this pointer ensures + * that we don't have a dangling pointer to a closed file. + * + * Cleared via rcu_assign_pointer() to ensure proper memory visibility + * for concurrent lockless readers under RCU. + */ + rcu_assign_pointer(kvm->vm_file, NULL); +#endif + kvm_irqfd_release(kvm); =20 kvm_put_kvm(kvm); @@ -5476,11 +5486,47 @@ bool file_is_kvm(struct file *file) } EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm); =20 +struct file *kvm_create_vm_file(unsigned long type, const char *fdname) +{ + struct kvm *kvm =3D kvm_create_vm(type, fdname); + struct file *file; + + if (IS_ERR(kvm)) + return ERR_CAST(kvm); + + file =3D anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); + if (IS_ERR(file)) { + kvm_put_kvm(kvm); + return file; + } + +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD + /* + * Weak reference to the file (without get_file()) to prevent a circular + * dependency. Safe because the file's release path clears this pointer + * and drops its reference to the VM. + * + * Written via rcu_assign_pointer() because the pointer can be read + * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via + * get_file_active() to prevent lockless ABA races). + */ + rcu_assign_pointer(kvm->vm_file, file); +#endif + + /* + * Don't call kvm_put_kvm anymore at this point; file->f_op is + * already set, with ->release() being kvm_vm_release(). In error + * cases it will be called by the final fput(file) and will take + * care of doing kvm_put_kvm(kvm). + */ + + return file; +} + static int kvm_dev_ioctl_create_vm(unsigned long type) { char fdname[ITOA_MAX_LEN + 1]; int r, fd; - struct kvm *kvm; struct file *file; =20 fd =3D get_unused_fd_flags(O_CLOEXEC); @@ -5489,31 +5535,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long ty= pe) =20 snprintf(fdname, sizeof(fdname), "%d", fd); =20 - kvm =3D kvm_create_vm(type, fdname); - if (IS_ERR(kvm)) { - r =3D PTR_ERR(kvm); - goto put_fd; - } - - file =3D anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); + file =3D kvm_create_vm_file(type, fdname); if (IS_ERR(file)) { r =3D PTR_ERR(file); - goto put_kvm; + goto put_fd; } =20 - /* - * Don't call kvm_put_kvm anymore at this point; file->f_op is - * already set, with ->release() being kvm_vm_release(). In error - * cases it will be called by the final fput(file) and will take - * care of doing kvm_put_kvm(kvm). - */ - kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm); + kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data); =20 fd_install(fd, file); return fd; =20 -put_kvm: - kvm_put_kvm(kvm); put_fd: put_unused_fd(fd); return r; @@ -6341,6 +6373,11 @@ static void kvm_uevent_notify_change(unsigned int ty= pe, struct kvm *kvm) kfree(env); } =20 +void kvm_uevent_notify_vm_create(struct kvm *kvm) +{ + kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm); +} + static void kvm_init_debug(void) { const struct file_operations *fops; diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 9fcc5d5b7f8d..7aa1d65c3d46 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -3,6 +3,9 @@ #ifndef __KVM_MM_H__ #define __KVM_MM_H__ 1 =20 +/* Worst case buffer size needed for holding an integer as a string. */ +#define ITOA_MAX_LEN 12 + /* * Architectures can choose whether to use an rwlock or spinlock * for the mmu_lock. These macros, for use in common code --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D9393E834E for ; Mon, 18 May 2026 09:36:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097010; cv=none; b=I99NcLaDUPANpsTu1d4sPvzUHCpeNzEzKt+Iax5tm64d1q3uAOPKOeY6vryFpypJp1BO5cH1d/JROpUGV1adQ08/anSGu9RStZazRN5ClIzXmzfl6zZq2fr6rCI0jrAGx4HroOyX+fExCGu9ld0CPYp6DIuAEOMJj49ziVtUWFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097010; c=relaxed/simple; bh=9QmMIEFTjxCYy9aGb8fw2FvVntyLSsRH/CgUauEVkjs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Xk6nGxFlX11Jan6n7XNgYK0DZr8pLUGjzaeoLAaemnUstLN/iIbQBwWKpqLoAoCSzE8HPKlDzGPb/zosK+8vw8bs7rJkrTzWZlUASyx+2gHCZ7qsVty/6pH5CFjUZhdCt7HpnIspuOWIF7BStAj1p+ATEpEMYhszilPHWQIHxR0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Di9L+kOc; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Di9L+kOc" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-6749e8562a7so1693775a12.3 for ; Mon, 18 May 2026 02:36:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097006; x=1779701806; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vOz8gAJuYfKjn0utZUC84bAY67+CiHezioPS6ecopLg=; b=Di9L+kOcUroxtM/rnsDrN4p16DtEOktBu9jUf+GT9g54BeYlS9yrpqCPsZ2OgmoE3Z bAERvc/VkyRoolx11vgOhT5Bn3yqawJ6RkeFq5sBTrWFRZlXZW1Kok0AWiNa+iScklx7 vgYHyGoVa4k0ixq/aBGd+YeSq7sy68LjUsK+BML44EJ/rVk8lrZ04z2/pnMZu/yZ97cz J4lMILd33tif7mZXTrXzmCszO2LGWFtCAdR5gJ5v9Gag7bak6lzjeLg3xjTCmpR9yuwm /9TT5KJIAeeRq9T60ZCZbd/jSqV7gGE0iuDxp+5cCezslAHC2vdRxxz09Un8v+4Py+yH JZWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097006; x=1779701806; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vOz8gAJuYfKjn0utZUC84bAY67+CiHezioPS6ecopLg=; b=lh6P10rrugwd8nZZ90MEQgtnP1madwvYh2OWoY/++FMhn49Jj1uPCws5oQe+4jNshz j/4Al0wgyUNaFI7qHl9fyVTwMUHypd2oc8x8bMh/0VbIE5yP5ihXBSssU57/5qyGhESK SMBFi/33eZbBWpqjFlGud8XhMmc2JjvvUAp9yve2FI97kviWQ/Fw/Cteww/+V9+ykQn7 uZKoMdT/GOIbe3UjArfWgpv450IKovEiFF5YAtCGeC3/9/x5CXJNg61poYv85OfMiYYd RCgiJvWq1gmMMBUIDQoLQoUdb7ezq50u5Z6TC7wJsk33rj3ku7cd3pqT3DvuBUQF9m/E n7wQ== X-Gm-Message-State: AOJu0YwZtFsNFmd6eEunmMP2JwDsX+XfCuNDGNf/YAAj7wBhZtDrrWPj ZcWDRIehJj2LxS9n2RmedgsH4AyRZrXagGSsLZC7qXuqpkz+9PCOQSzzacvuQUtPZaSoKlHQj4z 5EDUZCjv3ncqx3d0HCQ== X-Received: from edtc13.prod.google.com ([2002:aa7:c98d:0:b0:67b:7d11:72ff]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:46d6:b0:678:ce1e:a5cb with SMTP id 4fb4d7f45d1cf-683bd193662mr6858854a12.21.1779097005896; Mon, 18 May 2026 02:36:45 -0700 (PDT) Date: Mon, 18 May 2026 09:36:34 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: Subject: [RFC PATCH v1 4/9] kvm: kvm_luo: Allow kvm preservation with LUO From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce KVM VM preservation support for Live Update Orchestrator. Register an LUO file handler for KVM files to serialize and deserialize necessary VM state across live updates. Currently, this preserves the VM type and generic memory attributes. This implementation provides the necessary infrastructure and dependencies for the upcoming guest_memfd preservation support. And it can be extended to preserve more vm state in future. To preserve the kvm file it is necessary that the attributes that we are preserving must not change while or after preservation. The memory attribute change request is triggered by Guest to KVM and exit to VMM. VMM is aware that liveupdate is in progress and is expected to cancel this request Or pause the VM. This ensures that no change in memory attributes from guest are introduced while/after preservation of kvm. Retrieve is simply creating the kvm and populate the retrieved data. Only catch here is there is no way to know which fd is going to be assigned to this kvm file hence I am using atomically incremented id for the fdname. This change also updates the MAINTAINERS list for kvm_luo.c. Signed-off-by: Tarun Sahu --- My only worry is if userspace strictly depends on the fdname, that it needs to be consistent with vm_fd. Discussed more details in the cover letter. Would really appreciates the alternatives/other approaches. --- MAINTAINERS | 11 ++ include/linux/kho/abi/kvm.h | 54 ++++++ virt/kvm/Makefile.kvm | 1 + virt/kvm/kvm_luo.c | 346 ++++++++++++++++++++++++++++++++++++ 4 files changed, 412 insertions(+) create mode 100644 include/linux/kho/abi/kvm.h create mode 100644 virt/kvm/kvm_luo.c diff --git a/MAINTAINERS b/MAINTAINERS index c2c6d79275c6..2c26eb17bc0a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14404,6 +14404,17 @@ S: Maintained F: Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml F: drivers/video/backlight/ktz8866.c =20 +KVM LIVE UPDATE +M: Pasha Tatashin +M: Mike Rapoport +M: Pratyush Yadav +R: Tarun Sahu +L: kexec@lists.infradead.org +L: kvm@vger.kernel.org +S: Maintained +T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git +F: virt/kvm/kvm_luo.c + KVM PARAVIRT (KVM/paravirt) M: Paolo Bonzini R: Vitaly Kuznetsov diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h new file mode 100644 index 000000000000..31bd39588bdd --- /dev/null +++ b/include/linux/kho/abi/kvm.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2026, Google LLC. + * Tarun Sahu + * + * KVM Preservation ABI for Live Update Orchestrator (LUO) + */ +#ifndef _LINUX_KHO_ABI_KVM_H +#define _LINUX_KHO_ABI_KVM_H + +#include +#include + +/** + * DOC: KVM Live Update ABI + * + * KVM uses the ABI defined below for preserving its state + * across a kexec reboot using the LUO. + * + * The state is serialized into a packed structure `struct kvm_luo_ser` + * which is handed over to the next kernel via the KHO mechanism. + * + * This interface is a contract. Any modification to the structure layout + * constitutes a breaking change. Such changes require incrementing the + * version number in the KVM_LUO_FH_COMPATIBLE compatibility string. + */ + +/** + * struct kvm_luo_mem_attr - GFN memory attribute serialization. + * @gfn: Guest Frame Number. + * @attributes: Memory attributes associated with this GFN. + */ +struct kvm_luo_mem_attr { + u64 gfn; + u64 attributes; +} __packed; + +/** + * struct kvm_luo_ser - Main serialization structure for a KVM VM. + * @type: The type of VM. + * @nr_mem_attrs: The number of memory attributes in the array. + * @mem_attrs: KHO vmalloc descriptor pointing to the array of + * struct kvm_luo_mem_attr. + */ +struct kvm_luo_ser { + u64 type; + u64 nr_mem_attrs; + struct kho_vmalloc mem_attrs; +} __packed; + +/* The compatibility string for KVM VM file handler */ +#define KVM_LUO_FH_COMPATIBLE "kvm_vm_luo_v1" + +#endif /* _LINUX_KHO_ABI_KVM_H */ diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index d047d4cf58c9..c1a962159264 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -13,3 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) +=3D $(KVM)/irqchip.o kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) +=3D $(KVM)/dirty_ring.o kvm-$(CONFIG_HAVE_KVM_PFNCACHE) +=3D $(KVM)/pfncache.o kvm-$(CONFIG_KVM_GUEST_MEMFD) +=3D $(KVM)/guest_memfd.o +kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) +=3D $(KVM)/kvm_luo.o diff --git a/virt/kvm/kvm_luo.c b/virt/kvm/kvm_luo.c new file mode 100644 index 000000000000..1cf3941c16b7 --- /dev/null +++ b/virt/kvm/kvm_luo.c @@ -0,0 +1,346 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2026, Google LLC. + * Tarun Sahu + * + * KVM VM Preservation for Live Update Orchestrator (LUO) + */ + +/** + * DOC: KVM VM Preservation via LUO + * + * Overview + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * KVM virtual machines (VMs) can be preserved over a kexec reboot using t= he + * Live Update Orchestrator (LUO) file preservation. This allows userspace + * to preserve KVM VM state across kexec reboots. + * + * The preservation is not intended to be fully transparent. Only specific + * VM configuration and state are preserved, while other aspects of the VM + * must be re-established or re-configured by userspace after retrieval. + * + * Preserved Properties + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + * + * The following properties of the KVM VM are preserved across kexec: + * + * VM Type + * The VM type (e.g., on x86 architecture, the vm_type parameter) is + * preserved. + * + * Memory Attributes + * All entries in the memory attributes array are preserved. + * + * Non-Preserved Properties + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + * + * The preservation does not cover: + * + * - vCPUs and vCPU states + * - Memspots / Memory slot layout (memslots) + * - Interrupt controllers and IRQ routings + * - Coalesced MMIO zones + * - Device bindings (VFIO/Eventfds) + * - Active paging or guest registers state + * - etc + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "kvm_mm.h" + +static bool kvm_luo_can_preserve(struct liveupdate_file_handler *handler, + struct file *file) +{ + return file_is_kvm(file); +} + +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +static int kvm_luo_preserve_mem_attrs(struct kvm *kvm, struct kvm_luo_ser = *ser, + struct kvm_luo_mem_attr **mem_attrs_ptr) +{ + struct kvm_luo_mem_attr *mem_attrs =3D NULL; + unsigned long index; + void *attributes; + u64 count =3D 0; + int err; + + mutex_lock(&kvm->slots_lock); + + xa_for_each(&kvm->mem_attr_array, index, attributes) { + count++; + } + + if (count =3D=3D 0) { + mutex_unlock(&kvm->slots_lock); + ser->nr_mem_attrs =3D 0; + *mem_attrs_ptr =3D NULL; + return 0; + } + + mem_attrs =3D vcalloc(count, sizeof(*mem_attrs)); + if (!mem_attrs) { + mutex_unlock(&kvm->slots_lock); + return -ENOMEM; + } + + count =3D 0; + xa_for_each(&kvm->mem_attr_array, index, attributes) { + mem_attrs[count].gfn =3D index; + mem_attrs[count].attributes =3D xa_to_value(attributes); + count++; + } + + mutex_unlock(&kvm->slots_lock); + + ser->nr_mem_attrs =3D count; + err =3D kho_preserve_vmalloc(mem_attrs, &ser->mem_attrs); + if (err) { + vfree(mem_attrs); + return err; + } + + *mem_attrs_ptr =3D mem_attrs; + return 0; +} + +static int kvm_luo_retrieve_mem_attrs(struct kvm *kvm, struct kvm_luo_ser = *ser, + bool *mem_attrs_restored_ptr) +{ + struct kvm_luo_mem_attr *mem_attrs; + u64 i; + int err =3D 0; + + if (!ser->nr_mem_attrs) + return 0; + + mem_attrs =3D kho_restore_vmalloc(&ser->mem_attrs); + *mem_attrs_restored_ptr =3D true; + if (!mem_attrs) + return -EINVAL; + + for (i =3D 0; i < ser->nr_mem_attrs; i++) { + err =3D xa_err(xa_store(&kvm->mem_attr_array, mem_attrs[i].gfn, + xa_mk_value(mem_attrs[i].attributes), + GFP_KERNEL_ACCOUNT)); + if (err) + break; + } + vfree(mem_attrs); + return err; +} + +static void kvm_luo_retrieve_mem_attrs_cleanup(struct kvm_luo_ser *ser, + bool mem_attrs_restored) +{ + struct kvm_luo_mem_attr *mem_attrs =3D NULL; + + if (ser->nr_mem_attrs && !mem_attrs_restored) + mem_attrs =3D kho_restore_vmalloc(&ser->mem_attrs); + vfree(mem_attrs); +} + +static void kvm_luo_unpreserve_mem_attrs(struct kvm_luo_ser *ser) +{ + if (ser && ser->nr_mem_attrs) + kho_unpreserve_vmalloc(&ser->mem_attrs); +} + +static void kvm_luo_finish_mem_attrs(struct kvm_luo_ser *ser) +{ + struct kvm_luo_mem_attr *mem_attrs; + + if (ser && ser->nr_mem_attrs) { + mem_attrs =3D kho_restore_vmalloc(&ser->mem_attrs); + if (mem_attrs) + vfree(mem_attrs); + } +} +#else +static inline int kvm_luo_preserve_mem_attrs(struct kvm *kvm, + struct kvm_luo_ser *ser, + struct kvm_luo_mem_attr **mem_attrs_ptr) +{ + ser->nr_mem_attrs =3D 0; + *mem_attrs_ptr =3D NULL; + return 0; +} + +static inline int kvm_luo_retrieve_mem_attrs(struct kvm *kvm, + struct kvm_luo_ser *ser, + bool *mem_attrs_restored_ptr) +{ + if (ser->nr_mem_attrs) + return -EOPNOTSUPP; + return 0; +} + +static inline void kvm_luo_retrieve_mem_attrs_cleanup(struct kvm_luo_ser *= ser, + bool mem_attrs_restored) +{ +} + +static inline void kvm_luo_unpreserve_mem_attrs(struct kvm_luo_ser *ser) +{ +} + +static inline void kvm_luo_finish_mem_attrs(struct kvm_luo_ser *ser) +{ +} +#endif + +static int kvm_luo_preserve(struct liveupdate_file_op_args *args) +{ + struct kvm *kvm =3D args->file->private_data; + struct kvm_luo_mem_attr *mem_attrs =3D NULL; + struct kvm_luo_ser *ser; + int err =3D 0; + + if (kvm->vm_dead || kvm->vm_bugged) + return -EINVAL; + + ser =3D kho_alloc_preserve(sizeof(*ser)); + if (IS_ERR(ser)) + return PTR_ERR(ser); + + err =3D kvm_luo_preserve_mem_attrs(kvm, ser, &mem_attrs); + if (err) + goto err_free_ser; + +#ifdef CONFIG_X86 + ser->type =3D kvm->arch.vm_type; +#else + ser->type =3D 0; +#endif + + args->serialized_data =3D virt_to_phys(ser); + args->private_data =3D mem_attrs; + + return 0; + +err_free_ser: + kho_unpreserve_free(ser); + return err; +} + +static atomic_t restored_vm_id =3D ATOMIC_INIT(0); + +static int kvm_luo_retrieve(struct liveupdate_file_op_args *args) +{ + struct kvm_luo_mem_attr *mem_attrs =3D NULL; + bool mem_attrs_restored =3D false; + char fdname[ITOA_MAX_LEN + 1]; + struct kvm_luo_ser *ser; + struct file *file; + struct kvm *kvm; + int err =3D 0; + + if (!args->serialized_data) + return -EINVAL; + + ser =3D phys_to_virt(args->serialized_data); + + snprintf(fdname, sizeof(fdname), "%d", + atomic_inc_return(&restored_vm_id)); + + file =3D kvm_create_vm_file(ser->type, fdname); + if (IS_ERR(file)) { + err =3D PTR_ERR(file); + goto err_free_ser; + } + + kvm =3D file->private_data; + + err =3D kvm_luo_retrieve_mem_attrs(kvm, ser, &mem_attrs_restored); + if (err) + goto err_destroy_file; + + args->file =3D file; + kho_restore_free(ser); + + kvm_uevent_notify_vm_create(kvm); + return 0; + +err_destroy_file: + fput(file); +err_free_ser: + kvm_luo_retrieve_mem_attrs_cleanup(ser, mem_attrs_restored); + kho_restore_free(ser); + return err; +} + +static void kvm_luo_unpreserve(struct liveupdate_file_op_args *args) +{ + struct kvm_luo_mem_attr *mem_attrs =3D args->private_data; + struct kvm_luo_ser *ser; + + /* + * in case preservation failed, args->serialized_data will + * be NULL and kvm_luo_preserve takes care of cleaning up. + * If preserve succeeds, this condition fails and unpreserve + * function takes care of cleaning up. + */ + if (WARN_ON_ONCE(!args->serialized_data)) + return; + + ser =3D phys_to_virt(args->serialized_data); + + kvm_luo_unpreserve_mem_attrs(ser); + kho_unpreserve_free(ser); + vfree(mem_attrs); +} + +static void kvm_luo_finish(struct liveupdate_file_op_args *args) +{ + struct kvm_luo_ser *ser; + + /* + * If retrieve_status is true or set to error, nothing to do here. + * Already cleaned up in kvm_luo_retrieve(). + */ + if (args->retrieve_status) + return; + + if (!args->serialized_data) + return; + + ser =3D phys_to_virt(args->serialized_data); + kvm_luo_finish_mem_attrs(ser); + kho_restore_free(ser); +} + +static const struct liveupdate_file_ops kvm_luo_file_ops =3D { + .can_preserve =3D kvm_luo_can_preserve, + .preserve =3D kvm_luo_preserve, + .retrieve =3D kvm_luo_retrieve, + .unpreserve =3D kvm_luo_unpreserve, + .finish =3D kvm_luo_finish, + .owner =3D THIS_MODULE, +}; + +static struct liveupdate_file_handler kvm_luo_handler =3D { + .ops =3D &kvm_luo_file_ops, + .compatible =3D KVM_LUO_FH_COMPATIBLE, +}; + +static int __init kvm_luo_init(void) +{ + int err =3D liveupdate_register_file_handler(&kvm_luo_handler); + + if (err && err !=3D -EOPNOTSUPP) { + pr_err("Could not register kvm_vm_luo handler: %pe\n", ERR_PTR(err)); + return err; + } + + return 0; +} +late_initcall(kvm_luo_init); --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8316F3E9299 for ; Mon, 18 May 2026 09:36:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097011; cv=none; b=UfOdtEoKuHb8/oG/pBafWkSZNdZMdr8N6/xUU8d9ZnGufaUstZr9XdPW02C8QTnUkAU4DAeATgDwSxYFVkMi5zwb6nff0u2S/xTRSO524ufTmh2UUZqN5Jhf47a9h1RerrATe0+j1ZjU7mUexqr++Z7qNwLs670wx/GJ0zlSi+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097011; c=relaxed/simple; bh=rX9trZHLhIabH8PpO+YhRR+xBUlfsUPyON5NVc6GASM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tHSMdUSDF6vbhmjUJL3nYNrpJouxjJ6xtt7e2PIwT91hl0mkVUznqrtrAPEmn0NVKKVO+1M1JgDVsb9b3or9AoN/Tia/Pv4zl+riYDJ23jvvHOmOSb7b9cDnBYdQu3Okq/qAvjooLddZLHtkCJ7YNmKkm/6ZmJUiw/ccpWW2o4I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JKBPVs20; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JKBPVs20" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-48919890a95so11342145e9.2 for ; Mon, 18 May 2026 02:36:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097008; x=1779701808; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XKCrM2P1AyKOpet4Lptlcq4TPub3FnQxmPnAQcRZa6A=; b=JKBPVs20Ne8L2Epqc+3Pn249xehI2ez8ToL/CHThnwJmCdxuz+FztnFfCIWAlNmhfw qEPJT6oxATK0vx8fnTnrpA/8NqZXyGAR5+RTNWwwydMNvQ1hIQz7HARe56u4rvGLZb2N 8B8d2anwMUHl1lO7jzNTMM1ZiAnV5BIfCGYSy/4UBQVHP0Um7eyq+eZXwIeoC4iWO68K Hrpc0/HQ10vSDOLjXA5T0T8t0v+//s80aInqpaM5N2/PMzHzz6qKVClRCLcnS+CFU4IZ ZG4smFzcPWWDcf68g0nShpiRWHjek59lYUGxrQe4GfwRyadyuRSaGPjudoXZxY2xN3Yd gkgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097008; x=1779701808; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XKCrM2P1AyKOpet4Lptlcq4TPub3FnQxmPnAQcRZa6A=; b=LAXt4M5R1C+Ol7lDmpNQ34vAocPfIAbXR/vg1eMtOqD1AGQWReAipOCOR/wsSGvTsF 3iQmu4iAGBmofdHCKnTEN8dqkmg/ga0+msIOnY6ue5UVTmX+NEggtM+makUeE6ypIcB9 R7k184ksfzjHDPCCym/ZZCYHYl5Ei1cqKpVt8+BLy2RN28ga6j8Jmz+YqVJioGIzqhLw bNnHVxzNscsRBHcNo65cTrdqJeE/ahWr5oMSstEed0VNDEhmoWp9MnuoBPIY+2ZcrWOo xOfWV6RJa3NSi+Fe3Dp4SYhueERHf1CYXz/LXvBV3C+Vokbp8LVy+edGPNUq/aPooypr 2kTQ== X-Gm-Message-State: AOJu0Yxp7YQeVUOT2kH9gNe5E69LKao8uykql4hIUZVI7qLnpWUO5E7B MdBge5Fry8cBGZ8RhZ/LS+oi53DuV+8/zPneELpJbDs30kMIdtpdV5om8m4/4ERD+mhEV1t2AT2 kMns46zs5GhMhQcnB3Q== X-Received: from wrbdw3.prod.google.com ([2002:a05:6000:dc3:b0:43c:fe6b:f318]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e47:b0:48f:e1ac:c94f with SMTP id 5b1f17b1804b1-48fe60eb0ebmr237233615e9.10.1779097007184; Mon, 18 May 2026 02:36:47 -0700 (PDT) Date: Mon, 18 May 2026 09:36:35 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <7ffc1fb8962fd27428ac3e2f3d8950492597e03b.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 5/9] kvm: guest_memfd: Move internal definitions and helper to new header From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support guest_memfd memory preservation with LUO, guest_memfd luo code needs to access guest_memfd internals and reconstruct guest_memfd file instances from a preserved state. Extract gmem_file, gmem_inode, and the GMEM_I() helper from guest_memfd.c into a new internal header virt/kvm/guest_memfd.h. Additionally, split __kvm_gmem_create() to expose a non-static __kvm_gmem_create_file() helper. This helper returns a struct file instead of a file descriptor, enabling file creation and initialization without installing it into a file descriptor table. Signed-off-by: Tarun Sahu --- virt/kvm/guest_memfd.c | 68 +++++++++++++++++------------------------- virt/kvm/guest_memfd.h | 39 ++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 40 deletions(-) create mode 100644 virt/kvm/guest_memfd.h diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 69c9d6d546b2..6740ae2bf948 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -7,38 +7,12 @@ #include #include #include +#include "guest_memfd.h" =20 #include "kvm_mm.h" =20 static struct vfsmount *kvm_gmem_mnt; =20 -/* - * A guest_memfd instance can be associated multiple VMs, each with its own - * "view" of the underlying physical memory. - * - * The gmem's inode is effectively the raw underlying physical storage, an= d is - * used to track properties of the physical memory, while each gmem file is - * effectively a single VM's view of that storage, and is used to track as= sets - * specific to its associated VM, e.g. memslots=3D>gmem bindings. - */ -struct gmem_file { - struct kvm *kvm; - struct xarray bindings; - struct list_head entry; -}; - -struct gmem_inode { - struct shared_policy policy; - struct inode vfs_inode; - struct list_head gmem_file_list; - - u64 flags; -}; - -static __always_inline struct gmem_inode *GMEM_I(struct inode *inode) -{ - return container_of(inode, struct gmem_inode, vfs_inode); -} =20 #define kvm_gmem_for_each_file(f, inode) \ list_for_each_entry(f, &GMEM_I(inode)->gmem_file_list, entry) @@ -556,23 +530,17 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct= kvm *kvm) return true; } =20 -static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) +struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flag= s) { static const char *name =3D "[kvm-gmem]"; struct gmem_file *f; struct inode *inode; struct file *file; - int fd, err; - - fd =3D get_unused_fd_flags(0); - if (fd < 0) - return fd; + int err; =20 f =3D kzalloc_obj(*f); - if (!f) { - err =3D -ENOMEM; - goto err_fd; - } + if (!f) + return ERR_PTR(-ENOMEM); =20 /* __fput() will take care of fops_put(). */ if (!fops_get(&kvm_gmem_fops)) { @@ -611,8 +579,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t si= ze, u64 flags) xa_init(&f->bindings); list_add(&f->entry, &GMEM_I(inode)->gmem_file_list); =20 - fd_install(fd, file); - return fd; + return file; =20 err_inode: iput(inode); @@ -620,7 +587,28 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t s= ize, u64 flags) fops_put(&kvm_gmem_fops); err_gmem: kfree(f); -err_fd: + return ERR_PTR(err); +} + +static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) +{ + struct file *file; + int fd, err; + + fd =3D get_unused_fd_flags(0); + if (fd < 0) + return fd; + + file =3D __kvm_gmem_create_file(kvm, size, flags); + if (IS_ERR(file)) { + err =3D PTR_ERR(file); + goto err_put_fd; + } + + fd_install(fd, file); + return fd; + +err_put_fd: put_unused_fd(fd); return err; } diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h new file mode 100644 index 000000000000..c528b046dd69 --- /dev/null +++ b/virt/kvm/guest_memfd.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_GUEST_MEMFD_H__ +#define __KVM_GUEST_MEMFD_H__ 1 + +#include +#include +#include + +/* + * A guest_memfd instance can be associated multiple VMs, each with its own + * "view" of the underlying physical memory. + * + * The gmem's inode is effectively the raw underlying physical storage, an= d is + * used to track properties of the physical memory, while each gmem file is + * effectively a single VM's view of that storage, and is used to track as= sets + * specific to its associated VM, e.g. memslots=3D>gmem bindings. + */ +struct gmem_file { + struct kvm *kvm; + struct xarray bindings; + struct list_head entry; +}; + +struct gmem_inode { + struct shared_policy policy; + struct inode vfs_inode; + struct list_head gmem_file_list; + + u64 flags; +}; + +static inline struct gmem_inode *GMEM_I(struct inode *inode) +{ + return container_of(inode, struct gmem_inode, vfs_inode); +} + +struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flag= s); + +#endif /* __KVM_GUEST_MEMFD_H__ */ --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55CFC3E95A4 for ; Mon, 18 May 2026 09:36:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097014; cv=none; b=heOJhfD8n4UoMKvBrUt4PB+fuPLegW3Z9RRXPdElrS5UsHblR39ltd6LH4ZeGuXZPO9jhPw8n9Hv8C2Imtq0f/na3R4QGaS03fRjgg3E6dKZbcAe3ItnQVoZgLGbBADtQ5TrHz00PNHw4R0VXaM2SdyBJcrk3XBlni/ZHaRa0GU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097014; c=relaxed/simple; bh=ic5RMPuUuwa4YJTZl2LkoEhRdzsDY9wDUPw1tfyW2LU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=V/4MMJDve9QdXLVxdqnpNmx7gWruSb8iX0cRAJGkmIFRTUGiqnwvNe7rnXR1X5hhdM4Cl6JUOlACAMqE4/NCCm5lDHEZYmeb7TntWrgXOOuDMJFALibDVIksxDauD88KKq3jaN5e42HR+z4xq5UFdnrXzfHMxEB9ZrXz3mjmU+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=u3TnTnsE; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="u3TnTnsE" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-bd86cab520cso38977866b.3 for ; Mon, 18 May 2026 02:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097009; x=1779701809; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yB4XsnS5O2tJ5kB+OIL7J7kvYCLgSP2Q9yHfNFYen64=; b=u3TnTnsEXMB+wdEo4Y2oM+P3/Z72AGIMGfUSY+AJbBs4eg3Ue7SQW8DDpBGkVcHXSu NU8oj5fZr2eJeryc+540xWiUAMygGQKY/Enzhycu8YFpqx1g+36N5Rr0a02PQD8D0Qrr y1hs3JigQO+02nKk3/Ei753Jp2Jf/O95Ita/V1TxcvIEsUMyRuB+5iz96nTMMIDXfspr PT6z9wyFxEOHtW3X9KL4WMzSRcLLEPh/7JEhpbaVFh/A778HfGoVh+TxHDdPT8gPyPka g377iCrMj9tip/8NZ17JXhLGgJdgxBtLIqmul+trc++pbxHjpy9A1FDD1Ou7ENxO3vGB 60Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097009; x=1779701809; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yB4XsnS5O2tJ5kB+OIL7J7kvYCLgSP2Q9yHfNFYen64=; b=jc4BA+DM92hu5u8iXoHoMQogfiqaeBunIqOvCFCCgGBzmv/EOAjT8vTkWQXVDegC2D UkXI+ITuUl9hHK70RVQU9sURzyZ5rmUUmakDhfHpt5LCUZ7D55ovqC2n74EuSsmgjjV8 A0NVkXUHiTstQcMKxokcT9/OAkg4p6tNWVxGfxSiw9qoHNUhgck3hmnbZOUOG0+VxnRR em5EDRo3vNDkVO/lqxBzOIf0uMAJqlnmlxRkUjU2HfbOaU3mqadEHXbwV9PkaLml+2f9 7MgMdFkXfWAKZ5vw7GLYpuCDkVHwkA6KZ/lJM+P9SI1eO9m0d7QHsDPfEMiQhXZdfPYD UaOw== X-Gm-Message-State: AOJu0YzLxCARODGgjXNErIPlluLO0cFvAXJjybusXx4s0zPeGOtuL9xV RwEgDGG2fe8glSOTw9H5zLUhrynu+6rw8dGOxciUupsq4ghwv5yIgxan55OrG5Rubzmw6kc5kA8 Z9ad4E2ELyUXjXKfKIw== X-Received: from edj17.prod.google.com ([2002:a05:6402:3251:b0:66f:cd9f:9fd7]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:6d1d:b0:bb9:36dd:cd3d with SMTP id a640c23a62f3a-bd51776c98bmr687818166b.4.1779097008430; Mon, 18 May 2026 02:36:48 -0700 (PDT) Date: Mon, 18 May 2026 09:36:36 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <823a5f2af1d794c61560d0bad96863c4c851f29b.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 6/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch introduces the freeze on gmem_inode which prevents the fallocate call. This will avoid gmem file modification when it is being preserved Used srcu lock to synchronise the freeze call, where write blocks until all the reads are free. And reads are re-entrant. This can be extended to freeze the fault path as well. But currently the fault failure due to sudden freeze might be fatal to the running guest. Signed-off-by: Tarun Sahu --- virt/kvm/guest_memfd.c | 112 +++++++++++++++++++++++++++++++++++++---- virt/kvm/guest_memfd.h | 5 ++ 2 files changed, 106 insertions(+), 11 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6740ae2bf948..91e42f717286 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -7,11 +7,13 @@ #include #include #include +#include #include "guest_memfd.h" =20 #include "kvm_mm.h" =20 static struct vfsmount *kvm_gmem_mnt; +static struct srcu_struct kvm_gmem_freeze_srcu; =20 =20 #define kvm_gmem_for_each_file(f, inode) \ @@ -96,6 +98,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *ino= de, pgoff_t index) /* TODO: Support huge pages. */ struct mempolicy *policy; struct folio *folio; + int idx; =20 /* * Fast-path: See if folio is already present in mapping to avoid @@ -273,16 +276,30 @@ static long kvm_gmem_allocate(struct inode *inode, lo= ff_t offset, loff_t len) static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { + struct inode *inode =3D file_inode(file); int ret; + int idx; =20 - if (!(mode & FALLOC_FL_KEEP_SIZE)) - return -EOPNOTSUPP; + idx =3D srcu_read_lock(&kvm_gmem_freeze_srcu); + if (kvm_gmem_is_frozen(inode)) { + srcu_read_unlock(&kvm_gmem_freeze_srcu, idx); + return -EPERM; + } =20 - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) - return -EOPNOTSUPP; + if (!(mode & FALLOC_FL_KEEP_SIZE)) { + ret =3D -EOPNOTSUPP; + goto out; + } =20 - if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) - return -EINVAL; + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) { + ret =3D -EOPNOTSUPP; + goto out; + } + + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) { + ret =3D -EINVAL; + goto out; + } =20 if (mode & FALLOC_FL_PUNCH_HOLE) ret =3D kvm_gmem_punch_hole(file_inode(file), offset, len); @@ -291,6 +308,9 @@ static long kvm_gmem_fallocate(struct file *file, int m= ode, loff_t offset, =20 if (!ret) file_modified(file); + +out: + srcu_read_unlock(&kvm_gmem_freeze_srcu, idx); return ret; } =20 @@ -944,7 +964,9 @@ static void kvm_gmem_destroy_inode(struct inode *inode) =20 static void kvm_gmem_free_inode(struct inode *inode) { - kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode)); + struct gmem_inode *gi =3D GMEM_I(inode); + + kmem_cache_free(kvm_gmem_inode_cachep, gi); } =20 static const struct super_operations kvm_gmem_super_operations =3D { @@ -1001,12 +1023,21 @@ int kvm_gmem_init(struct module *module) if (!kvm_gmem_inode_cachep) return -ENOMEM; =20 + ret =3D init_srcu_struct(&kvm_gmem_freeze_srcu); + if (ret) + goto err_cache; + ret =3D kvm_gmem_init_mount(); - if (ret) { - kmem_cache_destroy(kvm_gmem_inode_cachep); - return ret; - } + if (ret) + goto err_srcu; + return 0; + +err_srcu: + cleanup_srcu_struct(&kvm_gmem_freeze_srcu); +err_cache: + kmem_cache_destroy(kvm_gmem_inode_cachep); + return ret; } =20 void kvm_gmem_exit(void) @@ -1014,5 +1045,64 @@ void kvm_gmem_exit(void) kern_unmount(kvm_gmem_mnt); kvm_gmem_mnt =3D NULL; rcu_barrier(); + cleanup_srcu_struct(&kvm_gmem_freeze_srcu); kmem_cache_destroy(kvm_gmem_inode_cachep); } + +/** + * kvm_gmem_freeze - Freeze or unfreeze a guest_memfd inode mapping. + * @inode: The guest_memfd inode. + * @freeze: True to freeze, false to unfreeze. + * + * This API is used strictly during the live update / preservation transit= ion + * window to prevent host userspace and guest-side faults from making any + * mapping modifications (such as fallocate or page fault allocation) + * to the guest_memfd page cache. + * + * NOTE: Currently It is only checked at fallocate path. Page fault path N= OT + * touched. + * + * Synchronization Strategy (Sleepable RCU): + * To avoid high-contention VFS locks (like inode_lock or filemap_invalida= te_lock) + * on the vCPU page fault hot paths, this subsystem implements a lightweig= ht, + * system-wide Sleepable RCU (SRCU) mechanism (`kvm_gmem_freeze_srcu`): + * + * Though currently, the freeze is checked only in fallocate, but it might= be needed + * to the fault path as well in future to completely freeze the inode. + * + * Global vs. Per-Inode SRCU: + * A single system-wide global static `srcu_struct` is used instead of a p= er-inode + * SRCU structure to completely prevent unprivileged users from exhausting= the + * host's per-CPU memory allocator. Because `init_srcu_struct()` allocates= per-CPU + * memory via `alloc_percpu()`, which is not accounted by memory cgroups (= memcg), + * a per-inode SRCU structure would allow a tenant to bypass cgroup limits= and + * trigger a system-wide Out-of-Memory (OOM) crash simply by spawning a la= rge + * number of guest_memfd file descriptors (bounded only by RLIMIT_NOFILE). + * + * Flag Modification Note: + * Since `GUEST_MEMFD_F_MAPPING_FROZEN` is the ONLY flag in `GMEM_I(inode)= ->flags` + * that is mutated dynamically at runtime (all other flags are creation-ti= me flags + * which remain strictly read-only), there is no possibility of concurrent= bit- + * modification races. Therefore, a standard `WRITE_ONCE` is fully safe and + * does not require complex `cmpxchg` synchronization loops. + * + */ +void kvm_gmem_freeze(struct inode *inode, bool freeze) +{ + u64 flags =3D READ_ONCE(GMEM_I(inode)->flags); + + if (freeze) + flags |=3D GUEST_MEMFD_F_MAPPING_FROZEN; + else + flags &=3D ~GUEST_MEMFD_F_MAPPING_FROZEN; + + WRITE_ONCE(GMEM_I(inode)->flags, flags); + + if (freeze) + synchronize_srcu(&kvm_gmem_freeze_srcu); +} + +bool kvm_gmem_is_frozen(struct inode *inode) +{ + return READ_ONCE(GMEM_I(inode)->flags) & GUEST_MEMFD_F_MAPPING_FROZEN; +} diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h index c528b046dd69..028c348a1023 100644 --- a/virt/kvm/guest_memfd.h +++ b/virt/kvm/guest_memfd.h @@ -29,11 +29,16 @@ struct gmem_inode { u64 flags; }; =20 +/* Internal kernel-only flags (must not overlap with UAPI flags) */ +#define GUEST_MEMFD_F_MAPPING_FROZEN (1ULL << 63) + static inline struct gmem_inode *GMEM_I(struct inode *inode) { return container_of(inode, struct gmem_inode, vfs_inode); } =20 struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flag= s); +void kvm_gmem_freeze(struct inode *inode, bool freeze); +bool kvm_gmem_is_frozen(struct inode *inode); =20 #endif /* __KVM_GUEST_MEMFD_H__ */ --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C5533E9C39 for ; Mon, 18 May 2026 09:36:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097014; cv=none; b=tB3D3SS3AKdSs2gg83+U6X7pRIOUHykxkD0Qhp3whnadHccT/evrP/jOMbg6fjiPd1LyPy4UYvx3fJ6VQoS//UqX9/Ibf7qkZdpOA1DZ2YKSoHBKPsM/cGvhVCHuoghXNpvzXD5a4k3cjqPVAyu4iZ5exURWi0ib/gUZ5NwsiRA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097014; c=relaxed/simple; bh=Yh3hmcOMmYVFIQrYASTLsJAhQJdHS7pfcqo51I1BiZI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WfoWYNensHjFKWqU4CtblKpffnyGyOyq6t//n5sriM++8PWN30tqyQ9JYoeWbGxX+bkKrgOGK4+xmjMoOXjZvF8ixPYZLYB8lLTALXWY2ETjMCADUqEEM3ivYzO3TK1WPMSavDFZ+/y/H042yAtwPYZROcOZ/NIgRzTmpxKJvYI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=F2U72qyk; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="F2U72qyk" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-bd548eb1008so187765566b.3 for ; Mon, 18 May 2026 02:36:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097010; x=1779701810; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ejBb0SYl6zXB44TKLfFrfIt7SEHIRghpBzn0BwO26w8=; b=F2U72qykT0wysiK+W2TV6durjAyiClaLiPxCk2M5taB/5hrvqm9AB8H5uTO0uqaknS 9issFnW1UMUgIHfiUrVb++i/zv8L33T3OJb/p7tJ9ydZ7rFR7gmVvjhDJ26+pRU40luK GRCp/jLYzoCRy4w3C3RBLBjCfqehEOYN0Fy999D2gL+ivZvcU1VtZUxUSWfZoPmE+3LC YEpqLGlY02sqqg1xFiZ9TwJKYLjlzvn9Y7OWuTzJAAomYcWJdrp228g2/ZwDJrI1hXcA lvI6njBcgFuy4MMPZqwwGUsPZGT0UgzX2vbnd6VsQCrPtse7nHPfR+R22vfzxUpkD+Ax 1QRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097010; x=1779701810; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ejBb0SYl6zXB44TKLfFrfIt7SEHIRghpBzn0BwO26w8=; b=A17oVUb8iMqvjbOWJr2gM2Y+Fuc2wKW5sBSO2UqJAZ8WfWwBdgdNyWquFhj8HNubKb qLdLiS0gDnZaQ9A9/zHCd8hvei9xWunxWi8YnOVU7vSZiSbp+GUG3aUGdscv1s5ij+c3 8qve6PcgsjdXa9CY0B+vInvZ/iy5jf5wlzhB0C9vZSt5raK//5grDEHzU3oMO/ymhy4z AwYCgr25Kmax/GPskw3vL2TTPAxWTJ30RpWiK8fRhKRBcebyaqX5mqw46ctH9FKxoHBq 5HFgYMRrwpcDSFtTenY/owFGSICt7ysMNZaMm9JDAMqW5o507aKNl7z+wSp2jez+82HR hfBg== X-Gm-Message-State: AOJu0YxxvMfYJgGQpSPL+EUfTJXz4wXm4GVMazC05qv7YmR7KD23ePfy qj9iT4eY5ZlPHO12lSvsWPuD1Tmr9OECJcRkLiURkbpHc9lGM8vaBlGvKAkpj7V3vnLu3EvRHhE WzMoJyDbJ58QJUC4FKw== X-Received: from edgi7-n1.prod.google.com ([2002:a05:6402:a587:10b0:67c:76a4:f702]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:3d89:b0:bd5:7c2:42bf with SMTP id a640c23a62f3a-bd517af6eb4mr598467566b.49.1779097009480; Mon, 18 May 2026 02:36:49 -0700 (PDT) Date: Mon, 18 May 2026 09:36:37 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <4d444f77f973f610386b78c5689b9726dc5f4c73.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 7/9] kvm: guest_memfd_luo: add support for guest_memfd preservation From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch sets up the basic infrastructure to preserve the guest_memfd. Currently this supports only fully shared guest_memfd (INIT_SHARED), pre-faulted and backed by PAGE_SIZE pages. It registers a new LUO file handler for guest_memfd files to serialize and deserialize guest memory. This allows preserving guest memory backed by guest_memfd across updates, ensuring that guest instances can be resumed seamlessly without losing their memory contents. Preservation is straight forward. It walks through the folios and serialize them. There is kvm_gmem_freeze call on preserve which freeze the guest_memfd inode. It avoids any changes to inode mapping with fallocate calls on or after preservation. No need to check this during the page fault as preservation is only supported for pre-faulted/pre-allocated guest_memfd. While retrieving the guest_memfd, it requires the struct kvm to create new guest_memfd. So it first get the vm_file from the same session using the token passed during the preservation. And use it to get vm_file->kvm. This change also update the MAINTAINERS list. Signed-off-by: Tarun Sahu --- Also, I wanted to use the luo file handler compatible string for guest_memfd_luo same as kvm_luo (KVM_LUO_FH_COMPATIBLE), but unfortnately LUO design does not permit this, every handler needs to be registered with the separate string. --- MAINTAINERS | 1 + include/linux/kho/abi/kvm.h | 79 +++++- virt/kvm/Makefile.kvm | 2 +- virt/kvm/guest_memfd_luo.c | 495 ++++++++++++++++++++++++++++++++++++ 4 files changed, 570 insertions(+), 7 deletions(-) create mode 100644 virt/kvm/guest_memfd_luo.c diff --git a/MAINTAINERS b/MAINTAINERS index 2c26eb17bc0a..e5402a56ab98 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14413,6 +14413,7 @@ L: kexec@lists.infradead.org L: kvm@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git +F: virt/kvm/guest_memfd_luo.c F: virt/kvm/kvm_luo.c =20 KVM PARAVIRT (KVM/paravirt) diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h index 31bd39588bdd..fcdec609a41e 100644 --- a/include/linux/kho/abi/kvm.h +++ b/include/linux/kho/abi/kvm.h @@ -9,20 +9,23 @@ #define _LINUX_KHO_ABI_KVM_H =20 #include +#include #include =20 /** - * DOC: KVM Live Update ABI + * DOC: KVM and guest_memfd Live Update ABI * - * KVM uses the ABI defined below for preserving its state + * KVM and guest_memfd use the ABI defined below for preserving their stat= es * across a kexec reboot using the LUO. * - * The state is serialized into a packed structure `struct kvm_luo_ser` - * which is handed over to the next kernel via the KHO mechanism. + * The state is serialized into packed structures (struct kvm_luo_ser and + * struct guest_memfd_luo_ser) which are handed over to the next kernel via + * the KHO mechanism. * - * This interface is a contract. Any modification to the structure layout + * This interface is a contract. Any modification to the structure layouts * constitutes a breaking change. Such changes require incrementing the - * version number in the KVM_LUO_FH_COMPATIBLE compatibility string. + * version number in the KVM_LUO_FH_COMPATIBLE or + * GUEST_MEMFD_LUO_FH_COMPATIBLE compatibility strings. */ =20 /** @@ -51,4 +54,68 @@ struct kvm_luo_ser { /* The compatibility string for KVM VM file handler */ #define KVM_LUO_FH_COMPATIBLE "kvm_vm_luo_v1" =20 +/** + * struct guest_memfd_luo_folio_ser - Serialization layout for a single fo= lio in guest_memfd. + * @pfn: Page Frame Number of the folio. + * @index: Page offset of the folio within the file. + * @flags: State flags associated with the folio. + */ +struct guest_memfd_luo_folio_ser { + u64 pfn:52; + u64 flags:12; + u64 index; +} __packed; + +/** + * GUEST_MEMFD_LUO_FOLIO_UPTODATE - The folio is up-to-date. + * + * This flag is per folio to check if the folio is uptodate. + */ +#define GUEST_MEMFD_LUO_FOLIO_UPTODATE BIT(0) + + +/** + * GUEST_MEMFD_LUO_FLAG_MMAP - The guest_memfd supports mmap. + * + * This flag indicates that the guest_memfd supports host-side mmap. + */ +#define GUEST_MEMFD_LUO_FLAG_MMAP BIT(0) + +/** + * GUEST_MEMFD_LUO_FLAG_INIT_SHARED - Initialize memory as shared. + * + * This flag indicates that the guest_memfd has been initialized as shared + * memory. + */ +#define GUEST_MEMFD_LUO_FLAG_INIT_SHARED BIT(1) + +/** + * GUEST_MEMFD_LUO_SUPPORTED_FLAGS - Supported guest_memfd LUO flags mask. + * + * A mask of all guest_memfd preservation flags supported by this version + * of the KVM LUO ABI. + */ +#define GUEST_MEMFD_LUO_SUPPORTED_FLAGS (GUEST_MEMFD_LUO_FLAG_MMAP | \ + GUEST_MEMFD_LUO_FLAG_INIT_SHARED) + +/** + * struct guest_memfd_luo_ser - Main serialization structure for guest_mem= fd. + * @size: The size of the file in bytes. + * @flags: File-level flags. + * @nr_folios: Number of folios in the folios array. + * @vm_token: Token of the associated KVM VM instance. + * @folios: KHO vmalloc descriptor pointing to the array of + * struct guest_memfd_luo_folio_ser. + */ +struct guest_memfd_luo_ser { + u64 size; + u64 flags; + u64 nr_folios; + u64 vm_token; + struct kho_vmalloc folios; +} __packed; + +/* The compatibility string for GUEST_MEMFD file handler */ +#define GUEST_MEMFD_LUO_FH_COMPATIBLE "guest_memfd_luo_v1" + #endif /* _LINUX_KHO_ABI_KVM_H */ diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index c1a962159264..d30fca094c42 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -13,4 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) +=3D $(KVM)/irqchip.o kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) +=3D $(KVM)/dirty_ring.o kvm-$(CONFIG_HAVE_KVM_PFNCACHE) +=3D $(KVM)/pfncache.o kvm-$(CONFIG_KVM_GUEST_MEMFD) +=3D $(KVM)/guest_memfd.o -kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) +=3D $(KVM)/kvm_luo.o +kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) +=3D $(KVM)/guest_memfd_luo.o $(KVM)/= kvm_luo.o diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c new file mode 100644 index 000000000000..66b931eafc82 --- /dev/null +++ b/virt/kvm/guest_memfd_luo.c @@ -0,0 +1,495 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2026, Google LLC. + * Tarun Sahu + * + * Guestmemfd Preservation for Live Update Orchestrator (LUO) + */ + +/** + * DOC: Guestmemfd Preservation via LUO + * + * Overview + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * Guest memory file descriptors (guest_memfd) can be preserved over a kex= ec + * reboot using the Live Update Orchestrator (LUO) file preservation. This + * allows userspace to preserve VM memory across kexec reboots. + * + * The preservation is not intended to be transparent. Only select propert= ies + * of the guest_memfd are preserved, while others are reset to default. + * + * .. note:: + * Currently, only guest_memfd backed by standard system page size + * (PAGE_SIZE) is supported. Huge pages are not supported. + * + * Preserved Properties + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + * + * The following properties of guest_memfd are preserved across kexec: + * + * File Size + * The size of the file is preserved. + * + * File Contents + * All folios present in the page cache are preserved. + * + * File-level Flags + * The file-level flags (such as MMAP support and INIT_SHARED default ma= pping) + * are preserved. + * + * Non-Preserved Properties + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + * + * NUMA Memory Policy + * NUMA memory policies associated with the guest_memfd are not preserve= d. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "guest_memfd.h" + +static int kvm_gmem_luo_walk_folios(struct address_space *mapping, + pgoff_t end_index, struct guest_memfd_luo_folio_ser *folios_ser, + u64 *out_count) +{ + struct folio_batch fbatch; + pgoff_t index =3D 0; + u64 count =3D 0; + int err =3D 0; + + folio_batch_init(&fbatch); + while (index < end_index) { + unsigned int nr, i; + + nr =3D filemap_get_folios(mapping, &index, end_index - 1, &fbatch); + if (nr =3D=3D 0) + break; + + for (i =3D 0; i < nr; i++) { + struct folio *folio =3D fbatch.folios[i]; + + if (folios_ser) { + if (folio_test_hwpoison(folio)) { + err =3D -EHWPOISON; + folio_batch_release(&fbatch); + goto out; + } + err =3D kho_preserve_folio(folio); + if (err) { + folio_batch_release(&fbatch); + goto out; + } + + folios_ser[count].pfn =3D folio_pfn(folio); + folios_ser[count].index =3D folio->index; + folios_ser[count].flags =3D folio_test_uptodate(folio) ? + GUEST_MEMFD_LUO_FOLIO_UPTODATE : 0; + } + count++; + } + folio_batch_release(&fbatch); + cond_resched(); + } + +out: + *out_count =3D count; + return err; +} + +static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *hand= ler, struct file *file) +{ + struct inode *inode =3D file_inode(file); + u64 count =3D 0; + pgoff_t end_index; + long size; + + if (inode->i_sb->s_magic !=3D GUEST_MEMFD_MAGIC) + return 0; + + if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)) + return 0; + + if (mapping_large_folio_support(inode->i_mapping)) + return 0; + + size =3D i_size_read(inode); + if (!size) + return 0; + + if (size & (PAGE_SIZE - 1)) + return 0; + + end_index =3D size >> PAGE_SHIFT; + + if (kvm_gmem_luo_walk_folios(inode->i_mapping, end_index, NULL, &count)) + return 0; + + if (count !=3D end_index) + return 0; + + return 1; +} + +static int kvm_gmem_luo_preserve(struct liveupdate_file_op_args *args) +{ + struct guest_memfd_luo_folio_ser *folios_ser; + u64 count, gmem_flags, abi_flags =3D 0; + struct guest_memfd_luo_ser *ser; + struct address_space *mapping; + struct gmem_file *gmem_file; + struct inode *inode; + pgoff_t end_index; + struct kvm *kvm; + int err =3D 0; + long size; + + inode =3D file_inode(args->file); + kvm_gmem_freeze(inode, true); + + mapping =3D inode->i_mapping; + size =3D i_size_read(inode); + if (!size) { + err =3D 0; + goto err_unfreeze_inode; + } + + if (WARN_ON_ONCE(size & (PAGE_SIZE - 1))) { + err =3D -EINVAL; + goto err_unfreeze_inode; + } + + gmem_file =3D args->file->private_data; + kvm =3D gmem_file->kvm; + + gmem_flags =3D READ_ONCE(GMEM_I(inode)->flags); + if (gmem_flags & ~(GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED + | GUEST_MEMFD_F_MAPPING_FROZEN)) { + err =3D -EOPNOTSUPP; + goto err_unfreeze_inode; + } + + if (gmem_flags & GUEST_MEMFD_FLAG_MMAP) + abi_flags |=3D GUEST_MEMFD_LUO_FLAG_MMAP; + if (gmem_flags & GUEST_MEMFD_FLAG_INIT_SHARED) + abi_flags |=3D GUEST_MEMFD_LUO_FLAG_INIT_SHARED; + + end_index =3D size >> PAGE_SHIFT; + + ser =3D kho_alloc_preserve(sizeof(*ser)); + if (IS_ERR(ser)) { + err =3D PTR_ERR(ser); + goto err_unfreeze_inode; + } + + folios_ser =3D vcalloc(end_index, sizeof(*folios_ser)); + if (!folios_ser) { + err =3D -ENOMEM; + goto err_free_ser; + } + + /* Walk: Fill the metadata array and preserve folios */ + err =3D kvm_gmem_luo_walk_folios(mapping, end_index, folios_ser, &count); + if (err) + goto err_unpreserve_unlocked; + + if (WARN_ON_ONCE(count !=3D end_index)) { + err =3D -EINVAL; + goto err_unpreserve_unlocked; + } + + ser->size =3D size; + ser->flags =3D abi_flags; + ser->nr_folios =3D count; + ser->vm_token =3D 0; // It will be set during the kvm_gmem_luo_freeze() + + err =3D kho_preserve_vmalloc(folios_ser, &ser->folios); + if (err) + goto err_unpreserve_unlocked; + + args->serialized_data =3D virt_to_phys(ser); + args->private_data =3D folios_ser; + + return 0; + +err_unpreserve_unlocked: + for (long i =3D count - 1; i >=3D 0; i--) { + struct folio *folio =3D pfn_folio(folios_ser[i].pfn); + + kho_unpreserve_folio(folio); + } + vfree(folios_ser); +err_free_ser: + kho_unpreserve_free(ser); +err_unfreeze_inode: + kvm_gmem_freeze(inode, false); + return err; +} + +static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args) +{ + struct guest_memfd_luo_ser *ser; + struct gmem_file *gmem_file; + struct kvm *kvm; + struct file *kvm_file; + u64 vm_token; + int err; + + if (WARN_ON_ONCE(!args->serialized_data)) + return -EINVAL; + + ser =3D phys_to_virt(args->serialized_data); + if (!ser) + return -EINVAL; + + gmem_file =3D args->file->private_data; + kvm =3D gmem_file->kvm; + + /* + * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE= _BY_RCU + * file memory from being reallocated while it is being processed. + */ + kvm_file =3D get_file_active(&kvm->vm_file); + if (!kvm_file) + return -ENOENT; + + err =3D liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token); + fput(kvm_file); + if (err) + return err; + + ser->vm_token =3D vm_token; + return 0; +} + +static void kvm_gmem_luo_discard_folios( + const struct guest_memfd_luo_folio_ser *folios_ser, + u64 nr_folios, u64 start_idx) +{ + long i; + + for (i =3D start_idx; i < nr_folios; i++) { + struct folio *folio; + phys_addr_t phys; + + if (!folios_ser[i].pfn) + continue; + + phys =3D PFN_PHYS(folios_ser[i].pfn); + folio =3D kho_restore_folio(phys); + if (folio) + folio_put(folio); + } +} + +static void kvm_gmem_luo_unpreserve(struct liveupdate_file_op_args *args) +{ + struct guest_memfd_luo_folio_ser *folios_ser =3D args->private_data; + struct guest_memfd_luo_ser *ser; + long i; + + if (WARN_ON_ONCE(!args->serialized_data)) + return; + + ser =3D phys_to_virt(args->serialized_data); + if (!ser) + return; + + if (ser->nr_folios > 0) + kho_unpreserve_vmalloc(&ser->folios); + for (i =3D ser->nr_folios - 1; i >=3D 0; i--) { + struct folio *folio; + + if (!folios_ser[i].pfn) + continue; + + folio =3D pfn_folio(folios_ser[i].pfn); + kho_unpreserve_folio(folio); + } + vfree(folios_ser); + + kho_unpreserve_free(ser); + kvm_gmem_freeze(file_inode(args->file), false); +} + +static int kvm_gmem_luo_retrieve(struct liveupdate_file_op_args *args) +{ + struct guest_memfd_luo_folio_ser *folios_ser =3D NULL; + struct guest_memfd_luo_ser *ser; + struct kvm *kvm =3D NULL; + struct file *vm_file; + struct inode *inode; + struct file *file; + u64 gmem_flags =3D 0; + int err =3D 0; + long i =3D 0; + + if (!args->serialized_data) + return -EINVAL; + + ser =3D phys_to_virt(args->serialized_data); + if (!ser) + return -EINVAL; + + if (ser->flags & ~GUEST_MEMFD_LUO_SUPPORTED_FLAGS) { + err =3D -EOPNOTSUPP; + goto err_free_ser; + } + + if (ser->flags & GUEST_MEMFD_LUO_FLAG_MMAP) + gmem_flags |=3D GUEST_MEMFD_FLAG_MMAP; + if (ser->flags & GUEST_MEMFD_LUO_FLAG_INIT_SHARED) + gmem_flags |=3D GUEST_MEMFD_FLAG_INIT_SHARED; + + err =3D liveupdate_get_file_incoming(args->session, ser->vm_token, &vm_fi= le); + if (err) { + pr_warn("gmem: provided VM FD token (%llx) on preserve is incorrect\n", + ser->vm_token); + goto err_free_ser; + } + + if (file_is_kvm(vm_file)) + kvm =3D vm_file->private_data; + + /* + * Release the temporary reference taken by the liveupdate_get_file_incom= ing + * call. LUO still holds a reference. + */ + fput(vm_file); + + if (!kvm) { + err =3D -EINVAL; + goto err_free_ser; + } + + file =3D __kvm_gmem_create_file(kvm, ser->size, gmem_flags); + if (IS_ERR(file)) { + err =3D PTR_ERR(file); + goto err_free_ser; + } + + inode =3D file_inode(file); + + if (ser->nr_folios) { + folios_ser =3D kho_restore_vmalloc(&ser->folios); + if (!folios_ser) { + err =3D -EINVAL; + goto err_destroy_file; + } + + for (i =3D 0; i < ser->nr_folios; i++) { + struct folio *folio; + phys_addr_t phys; + + if (!folios_ser[i].pfn) + continue; + + phys =3D PFN_PHYS(folios_ser[i].pfn); + folio =3D kho_restore_folio(phys); + if (!folio) { + pr_err("gmem: failed to restore folio at %llx\n", phys); + err =3D -EIO; + goto err_put_remaining_folios; + } + + err =3D filemap_add_folio(inode->i_mapping, folio, folios_ser[i].index, + GFP_KERNEL); + if (err) { + pr_err("gmem: failed to add folio to page cache\n"); + folio_put(folio); + goto err_put_remaining_folios; + } + + if (folios_ser[i].flags & GUEST_MEMFD_LUO_FOLIO_UPTODATE) + folio_mark_uptodate(folio); + folio_unlock(folio); + folio_put(folio); + } + vfree(folios_ser); + } + + args->file =3D file; + kho_restore_free(ser); + return 0; + +err_put_remaining_folios: + i++; +err_destroy_file: + fput(file); +err_free_ser: + if (ser->nr_folios) { + if (!folios_ser) + folios_ser =3D kho_restore_vmalloc(&ser->folios); + if (folios_ser) { + kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, i); + vfree(folios_ser); + } + } + kho_restore_free(ser); + return err; +} + +static void kvm_gmem_luo_finish(struct liveupdate_file_op_args *args) +{ + struct guest_memfd_luo_ser *ser; + struct guest_memfd_luo_folio_ser *folios_ser; + + /* Nothing to be done here, if retrieve_status was successful or errored, + * Cleanup is taken care of in retrieval call. + */ + if (args->retrieve_status) + return; + + if (!args->serialized_data) + return; + + ser =3D phys_to_virt(args->serialized_data); + if (!ser) + return; + + if (ser->nr_folios) { + folios_ser =3D kho_restore_vmalloc(&ser->folios); + if (folios_ser) { + kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, 0); + vfree(folios_ser); + } + } + + kho_restore_free(ser); +} + +static const struct liveupdate_file_ops kvm_gmem_luo_file_ops =3D { + .can_preserve =3D kvm_gmem_luo_can_preserve, + .preserve =3D kvm_gmem_luo_preserve, + .freeze =3D kvm_gmem_luo_freeze, + .retrieve =3D kvm_gmem_luo_retrieve, + .unpreserve =3D kvm_gmem_luo_unpreserve, + .finish =3D kvm_gmem_luo_finish, + .owner =3D THIS_MODULE, +}; + +static struct liveupdate_file_handler kvm_gmem_luo_handler =3D { + .ops =3D &kvm_gmem_luo_file_ops, + .compatible =3D GUEST_MEMFD_LUO_FH_COMPATIBLE, +}; + +static int __init kvm_gmem_luo_init(void) +{ + int err =3D liveupdate_register_file_handler(&kvm_gmem_luo_handler); + + if (err && err !=3D -EOPNOTSUPP) { + pr_err("Could not register luo filesystem handler: %pe\n", ERR_PTR(err)); + return err; + } + + return 0; +} +late_initcall(kvm_gmem_luo_init); --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35ECC3EB7FF for ; Mon, 18 May 2026 09:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097016; cv=none; b=de3925vYRx5H6QY9GWOSwkEBxcKIy6P1KM9udEYcuS8A1Bme34whFYOlEEju1paxKkfeuhcVIY3fZlpkgmllDT+xJx+SgdXFECzzcMO63ukxfarQyq5e51IlI+KGLrwkke9u+yA2EEei0f0EcQFyVplQ2HN/SzfVPgvoh/dNmlA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097016; c=relaxed/simple; bh=0yYslqANZr7F7xyw6UlV2QqcV1CLiunpaYZtbU4I8MA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=imnlpBUcvNwSuU/PDI4Ew7TyRB+xnKT8vTYdkIRMayJjTgKBH5pJSQOKCrCNlfiNWFn1Nn5c30HGp2HLhVPpbPdvq+rMOLDo6N4rsd4J+C0PRSpDkw+HACcd39lp20XreFTN674RI9Wx7Ejy6rcx0+qp8xOaoYm7zgp4Bz3ZFPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=s2TpQQpG; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="s2TpQQpG" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-bd548eb1008so187768866b.3 for ; Mon, 18 May 2026 02:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097011; x=1779701811; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZJXHhvWcO8WVbNSPa5FuUDQ4B9keuIjD7ZJ/qh+m6Pg=; b=s2TpQQpGaLR8eefmXOzBNHWP9KaG3NXQR2TIPzF7aQHmwaeftFBu7KISwvQNHPkfBs 4d5FpXzfZ7gZRu53qC7vSgFzuyN8vd2+WQhxlTE9IhczmFuwBypKGW06hkKSPUhDTE47 in/DwitH/LvEvqqIv4cromsbR2qzJQvwCrPA5Qmrs6sSkFF6N3Y+GUSUIerjZ31e8tdq PAS4WGzcCC2q5Y9RDFhwu1rKZz4/6JhwPOI04xrcRGc6vyET+SOQOii1wR45Ovhsx2hw x7ECNs+TPE4Y4y8eBCKiE9VZ9tCyWImxK5D90TePidzqAHD8vW2F2aFj4MO9Bpnt7SA0 g9Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097011; x=1779701811; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZJXHhvWcO8WVbNSPa5FuUDQ4B9keuIjD7ZJ/qh+m6Pg=; b=RFtpRbdQPwVvgrXX0q8SM5Mc/ULXXJdQgutteU5+DFxIfeoCFmtJYmMIOS6e0vTxCJ m77uvD6dleKNxb7XCHIxRm/mKeCG2UYR5YZ1azUSNCcC6Kh9cmz4UsffSbY+CgulbB9d WbLc662GRf4m83s3EGoLkwoqjL9lU9IhgHXJ4GIk2MJ/RaaV2ttZZZC5gp6tiMg0IGWH xUdIouzH+cqkrXI9ADqMfCGMJCEoGF/q9FRr6rF2krP0yp9AqAOS1Ro2NY/hkLB+ll3/ 3yWIZp0PtQiDJG26ovtsq1rZKRw0ZrLHVJ3GNOUAZRbGOAKfiZ06gHMI09kRB28sClQt Z9Qg== X-Gm-Message-State: AOJu0YwRFB6NymJKprFiA/BjHDDZgcZX+wJmTuV+6iN5r1ZMLDKSWXkM NgfeFHxOyHrjsXIhrHYD7E333tkX/H6DfMw0yfkuSaFnDFq16ejBvVDW3cCmttNWKZIwN3N1vwB Zrf0kuymCDxuM4QrXQQ== X-Received: from edcc7.prod.google.com ([2002:a05:6402:1f87:b0:67c:a74a:6214]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:1b1b:b0:b9c:3d56:e4ec with SMTP id a640c23a62f3a-bd51797a576mr551071666b.24.1779097010457; Mon, 18 May 2026 02:36:50 -0700 (PDT) Date: Mon, 18 May 2026 09:36:38 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <8bf92f738d3456ea71ffe2a8406714e0d2afa114.1779080766.git.tarunsahu@google.com> Subject: [RFC PATCH v1 8/9] selftests: kvm: Split ____vm_create() to expose init helpers From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor `____vm_create()` in the KVM selftest library to extract its initialization steps into separate, reusable internal helpers. Introduce `vm_init_fields()` and `vm_init_memory_properties()`. This allows advanced test setups to perform targeted VM fields or memory property initializations independently, which is required by upcoming test cases that restore preserved VMs. No functional changes are introduced for the existing tests. Signed-off-by: Tarun Sahu --- .../testing/selftests/kvm/include/kvm_util.h | 2 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 26 +++++++++++++------ 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing= /selftests/kvm/include/kvm_util.h index 2ecaaa0e9965..d10cd25d0658 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -471,6 +471,8 @@ const char *vm_guest_mode_string(u32 i); =20 void kvm_vm_free(struct kvm_vm *vmp); void kvm_vm_restart(struct kvm_vm *vmp); +void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape); +void vm_init_memory_properties(struct kvm_vm *vm); void kvm_vm_release(struct kvm_vm *vmp); void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename); int kvm_memfd_alloc(size_t size, bool hugepages); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/sel= ftests/kvm/lib/kvm_util.c index 2a76eca7029d..f4cd06d34ce9 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -276,13 +276,8 @@ __weak void vm_populate_gva_bitmap(struct kvm_vm *vm) (1ULL << (vm->va_bits - 1)) >> vm->page_shift); } =20 -struct kvm_vm *____vm_create(struct vm_shape shape) +void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape) { - struct kvm_vm *vm; - - vm =3D calloc(1, sizeof(*vm)); - TEST_ASSERT(vm !=3D NULL, "Insufficient Memory"); - INIT_LIST_HEAD(&vm->vcpus); vm->regions.gpa_tree =3D RB_ROOT; vm->regions.hva_tree =3D RB_ROOT; @@ -380,9 +375,10 @@ struct kvm_vm *____vm_create(struct vm_shape shape) if (vm->pa_bits !=3D 40) vm->type =3D KVM_VM_TYPE_ARM_IPA_SIZE(vm->pa_bits); #endif +} =20 - vm_open(vm); - +void vm_init_memory_properties(struct kvm_vm *vm) +{ /* Limit to VA-bit canonical virtual addresses. */ vm->vpages_valid =3D sparsebit_alloc(); vm_populate_gva_bitmap(vm); @@ -392,6 +388,20 @@ struct kvm_vm *____vm_create(struct vm_shape shape) =20 /* Allocate and setup memory for guest. */ vm->vpages_mapped =3D sparsebit_alloc(); +} + +struct kvm_vm *____vm_create(struct vm_shape shape) +{ + struct kvm_vm *vm; + + vm =3D calloc(1, sizeof(*vm)); + TEST_ASSERT(vm !=3D NULL, "Insufficient Memory"); + + vm_init_fields(vm, shape); + + vm_open(vm); + + vm_init_memory_properties(vm); =20 return vm; } --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 09:56:50 2026 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D54413EBF34 for ; Mon, 18 May 2026 09:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097017; cv=none; b=kcuANX4bm11wJsRxV6aoT3ARoWOZDE+DzAppED6Rx2t7HtahcebVUA/8einDiSlzX9kQCo2Pf1HVEw+xMXBZernJ1krHYgW1K9lvg3iy0K/wlzfXhXFl0QSY/21+IRGxkoOFzRcaWVK//OL0L3ovHZPY35OdrEo0swHBFKPtAz0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779097017; c=relaxed/simple; bh=xF0FkPdLJIQGHoVCk1kA8/18zDIww4vBJZuumvGQiUw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=c9Pn0qufjAeH4UFzVVn7ubA85rQuKrml5R37W9xY7t1QZrL7Q8k+zJoUscPeiCgtrsg7dVAmYqe84NoLjtx4VNRNxnEF8EV5BWQaG5HxT9gjt/EXCfFRzFhEhTNdoVJB7AFU6QF6KUO+qz9SML8TwvvNZIsfpK87ITqvG99uwlM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UOoJI29a; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tarunsahu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UOoJI29a" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-680ff5fbf99so2686456a12.0 for ; Mon, 18 May 2026 02:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779097012; x=1779701812; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O7OlXwmA5HFHQvyK/b9w4xYG/XKhpdapsDQN7y6zQXo=; b=UOoJI29aOTztVxE7XuTwDbRd6LrgNyT9BVd20n46pMWGWh+2E7MAiMxjl7K+op1fhW MT1U3XpTF42Imdivr10JZu1VC/YO3ilCLyYwVAvCOozjOTDI3d8tTKgtSGU+Ew8PNlI1 01xO/iQG+Sce+YsFsm8TJdNkvW53EI4iEyTnHWeIChxpPJzWDuenm95wARQ6dZ/r7EeE hCUSjil8tcBmHoLfPp7UT0rF3eIdbOA8xVfWgjlRQOJKtDQaMWFqBu84YiSfZ8VZYktV 5Pk1i0favkXtfsBKJdV9Np30NvM0ARltD95E7ZBmLyiGIO5lyXgHuUrKeztQ0lSfHe4r +MgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779097012; x=1779701812; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O7OlXwmA5HFHQvyK/b9w4xYG/XKhpdapsDQN7y6zQXo=; b=achnmp3eVMn0qXba0a8Hj8UK8v/hbdkKnhIxnr4VI4tlGUwgtyiAMxcWvJ52G5eReY yHWbBy1Yn+JNaQVWm/SO22c2g2TRw11qt00rb6xIwAYCa2CzB6XbZlcKPQQFMMyQBPdf 21bmFTbQumefjGqyHLPJwITErz0pacr8Yse+oNdehY7uzLgFWNJdWsjP4bs0QCwyXch9 5K4MIsEZDi+IOLZxvgHMz9Vg7eNGMdz/TUhj5DlqOVQfRNm6qYk1I0Bhxj2d2Ol+vWY+ Nt6R5c1bJo24qg5cs3GaJYaGWMI3jZhH+aCN+gO5pEqZDjxyfVOzs+dGgei35FTogHya wH+w== X-Gm-Message-State: AOJu0Yz3uSDCmGgGuOyaLxASwAxh005joyAb2QQAepRo8C6ll6wxEzNe SbOnzLZUdGXenrDB+wHDLJfI9HEd9jKvu5cX4iBN8PsnXUo3SQljTB1yNmvHiY5QJ3J7o0LzAsy u5i0NUdUeAvsCsvLj9g== X-Received: from edgg22.prod.google.com ([2002:a05:6402:1ed6:b0:67e:2459:8ce9]) (user=tarunsahu job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:13c2:b0:67c:7ffc:c4b with SMTP id 4fb4d7f45d1cf-683bd384cdbmr7475294a12.19.1779097011469; Mon, 18 May 2026 02:36:51 -0700 (PDT) Date: Mon, 18 May 2026 09:36:39 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: Subject: [RFC PATCH v1 9/9] selftests: kvm: Add guest_memfd_preservation_test From: Tarun Sahu To: axelrasmussen@google.com, mark.rutland@arm.com, skhawaja@google.com, Mike Rapoport , sagis@google.com, Jason Gunthorpe , Shuah Khan , ackerleytng@google.com, corbet@lwn.net, dmatlack@google.com, Paolo Bonzini , Andrew Morton , vannapurve@google.com, Pratyush Yadav , david@redhat.com, aneesh.kumar@kernel.org, vipinsh@google.com, Alexander Graf , David Hildenbrand , Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, Tarun Sahu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new KVM selftest `guest_memfd_preservation_test` to verify that guest memory backed by guest_memfd is preserved properly. The test leverages the Live Update Orchestrator (LUO) infrastructure to validate that memory folios and configuration layouts are successfully saved and then restored during kernel live updates, preventing any memory loss for the guest. Here, I have used the kvm selftests framework by creating a new vm and mapping two memory slots to it. One is the code that is executed inside the vm and other is the guest_memfd whose memory is being written by the guest code. In Phase 1: Once data is written the vm exits and wait for the user to trigger the kexec. In Phase 2: A new vm is created with retrieved kvm and again two memory slots are assigned. Once for guest code, and another is for retrieved guest_memfd where guest_memfd memory is verified by the executed guest code. If verification succeeds, The test passes. Signed-off-by: Tarun Sahu --- MAINTAINERS | 1 + tools/testing/selftests/kvm/Makefile.kvm | 2 + .../kvm/guest_memfd_preservation_test.c | 285 ++++++++++++++++++ 3 files changed, 288 insertions(+) create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_te= st.c diff --git a/MAINTAINERS b/MAINTAINERS index e5402a56ab98..647d60f6a1e2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14413,6 +14413,7 @@ L: kexec@lists.infradead.org L: kvm@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git +F: tools/testing/selftests/kvm/guest_memfd_preservation_test.c F: virt/kvm/guest_memfd_luo.c F: virt/kvm/kvm_luo.c =20 diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index 9118a5a51b89..4ea6cb7bf001 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -161,6 +161,8 @@ TEST_GEN_PROGS_x86 +=3D pre_fault_memory_test =20 # Compiled outputs used by test targets TEST_GEN_PROGS_EXTENDED_x86 +=3D x86/nx_huge_pages_test +# Manual test that forks a persistent background daemon; skip auto CI run +TEST_GEN_PROGS_EXTENDED_x86 +=3D guest_memfd_preservation_test =20 TEST_GEN_PROGS_arm64 =3D $(TEST_GEN_PROGS_COMMON) TEST_GEN_PROGS_arm64 +=3D arm64/aarch32_id_regs diff --git a/tools/testing/selftests/kvm/guest_memfd_preservation_test.c b/= tools/testing/selftests/kvm/guest_memfd_preservation_test.c new file mode 100644 index 000000000000..ad7b305b48c3 --- /dev/null +++ b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c @@ -0,0 +1,285 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2026, Google LLC. + * + * Author: Tarun Sahu + * + * Test for VM and guest_memfd preservation across kexec (Live Update) via= LUO. + * + * NOTE: This is a MANUAL test and is excluded from automated CI/testing + * frameworks because Phase 1 daemonizes into the background to pin resour= ces + * and requires a human operator to manually trigger kexec before Phase 2 + * is executed. Running Phase 1 automatically would leak the background da= emon + * and cause CI runners to falsely interpret it as a passed test. + * + * Usage: + * Phase 1: ./guest_memfd_preservation_test + * Phase 2: ./guest_memfd_preservation_test --phase2 + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" +#include "processor.h" +#include "test_util.h" +#include "ucall_common.h" +#include "../kselftest.h" +#include "../kselftest_harness.h" + +#include "../../../../include/uapi/linux/liveupdate.h" + +#define SESSION_NAME "gmem_vm_preservation_session" +#define VM_TOKEN 0x1001 +#define GMEM_TOKEN 0x1002 + +#define GMEM_SIZE (16ULL * 1024 * 1024) +#define DATA_SIZE (5ULL * 1024 * 1024) + +static size_t page_size; + +/* Deterministic byte pattern generation based on offset */ +static inline uint8_t get_pattern_byte(size_t offset) +{ + return (uint8_t)(offset ^ 0x5A); +} + +static void guest_code_phase1(uint64_t gpa, uint64_t size, uint64_t data_s= ize) +{ + uint8_t *mem =3D (uint8_t *)gpa; + size_t i; + + for (i =3D 0; i < data_size; i++) + mem[i] =3D get_pattern_byte(i); + + GUEST_DONE(); +} + +static void guest_code_phase2(uint64_t gpa, uint64_t size, uint64_t data_s= ize) +{ + uint8_t *mem =3D (uint8_t *)gpa; + size_t i; + + for (i =3D 0; i < data_size; i++) { + uint8_t val =3D get_pattern_byte(i); + + __GUEST_ASSERT(mem[i] =3D=3D val, + "Data mismatch at offset %lu! Expected 0x%x, got 0x%x", + i, val, mem[i]); + } + + GUEST_DONE(); +} + +static void do_phase1(void) +{ + uint64_t flags =3D GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED; + int gmem_fd, dev_luo_fd, ret; + const uint64_t gpa =3D SZ_4G; + struct kvm_vcpu *vcpu; + const int slot =3D 1; + struct kvm_vm *vm; + struct liveupdate_ioctl_create_session create_sess =3D { + .size =3D sizeof(create_sess), + .name =3D SESSION_NAME, + }; + struct liveupdate_session_preserve_fd preserve_vm =3D { + .size =3D sizeof(preserve_vm), + .token =3D VM_TOKEN, + }; + struct liveupdate_session_preserve_fd preserve_gmem =3D { + .size =3D sizeof(preserve_gmem), + .token =3D GMEM_TOKEN, + }; + + vm =3D __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, + guest_code_phase1); + gmem_fd =3D vm_create_guest_memfd(vm, GMEM_SIZE, flags); + vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE,= NULL, + gmem_fd, 0); + ret =3D fallocate(gmem_fd, FALLOC_FL_KEEP_SIZE, 0, GMEM_SIZE); + TEST_ASSERT(!ret, "fallocate failed, errno =3D %d (%s)", errno, strerror(= errno)); + + for (size_t i =3D 0; i < GMEM_SIZE; i +=3D page_size) + virt_pg_map(vm, gpa + i, gpa + i); + + vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE); + + vcpu_run(vcpu); + TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE); + + dev_luo_fd =3D open("/dev/liveupdate", O_RDWR); + TEST_ASSERT(dev_luo_fd >=3D 0, "Failed to open /dev/liveupdate"); + + TEST_ASSERT(ioctl(dev_luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, + &create_sess) =3D=3D 0, + "Failed to create LUO session"); + TEST_ASSERT(create_sess.fd >=3D 0, "Invalid session fd"); + + preserve_vm.fd =3D vm->fd; + TEST_ASSERT(ioctl(create_sess.fd, LIVEUPDATE_SESSION_PRESERVE_FD, + &preserve_vm) =3D=3D 0, + "Failed to preserve VM file descriptor"); + + preserve_gmem.fd =3D gmem_fd; + TEST_ASSERT(ioctl(create_sess.fd, LIVEUPDATE_SESSION_PRESERVE_FD, + &preserve_gmem) =3D=3D 0, + "Failed to preserve guest_memfd file descriptor"); + + printf("\n=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\n"); + printf("Phase 1 Complete Successfully!\n"); + printf("VM file and guest_memfd file have been preserved via LUO.\n"); + printf("Tokens: VM_TOKEN=3D0x%x, GMEM_TOKEN=3D0x%x\n", VM_TOKEN, GMEM_TOK= EN); + printf("Machine Size: %llu MB, Data Size: %llu MB\n", GMEM_SIZE / SZ_1M, + DATA_SIZE / SZ_1M); + printf("------------------------------------------------------------\n"); + + pid_t pid; + + printf("Forking background process to hold sessions open...\n"); + pid =3D fork(); + TEST_ASSERT(pid >=3D 0, "fork failed"); + + if (pid > 0) { + printf("Background child process PID: %d. Resources are pinned.\n", pid); + printf("ACTION REQUIRED: Trigger kexec now to boot into Phase 2 kernel.\= n"); + exit(EXIT_SUCCESS); + } + + /* Child process: detach from terminal and hold resources */ + if (setsid() < 0) + exit(EXIT_FAILURE); + + close(STDIN_FILENO); + close(STDOUT_FILENO); + close(STDERR_FILENO); + + while (1) + sleep(60); +} + +static struct kvm_vm *vm_create_from_fd(int resurrected_vm_fd, + struct vm_shape shape) +{ + struct kvm_vm *vm; + + vm =3D calloc(1, sizeof(*vm)); + TEST_ASSERT(vm !=3D NULL, "Insufficient Memory"); + + vm_init_fields(vm, shape); + + vm->kvm_fd =3D open_path_or_exit(KVM_DEV_PATH, O_RDWR); + vm->fd =3D resurrected_vm_fd; + + if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD)) + vm->stats.fd =3D vm_get_stats_fd(vm); + else + vm->stats.fd =3D -1; + + vm_init_memory_properties(vm); + + return vm; +} + +static void do_phase2(void) +{ + int retrieved_vm_fd, retrieved_gmem_fd, dev_luo_fd; + struct vm_shape shape =3D VM_SHAPE_DEFAULT; + const uint64_t gpa =3D SZ_4G; + struct kvm_vcpu *vcpu; + const int slot =3D 1; + struct kvm_vm *vm; + struct liveupdate_ioctl_retrieve_session retrieve_sess =3D { + .size =3D sizeof(retrieve_sess), + .name =3D SESSION_NAME, + }; + struct liveupdate_session_retrieve_fd retrieve_vm =3D { + .size =3D sizeof(retrieve_vm), + .token =3D VM_TOKEN, + }; + struct liveupdate_session_retrieve_fd retrieve_gmem =3D { + .size =3D sizeof(retrieve_gmem), + .token =3D GMEM_TOKEN, + }; + + dev_luo_fd =3D open("/dev/liveupdate", O_RDWR); + TEST_ASSERT(dev_luo_fd >=3D 0, "Failed to open /dev/liveupdate"); + + TEST_ASSERT(ioctl(dev_luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &retriev= e_sess) =3D=3D 0, + "Failed to retrieve LUO session"); + TEST_ASSERT(retrieve_sess.fd >=3D 0, "Invalid retrieved session fd"); + + TEST_ASSERT(ioctl(retrieve_sess.fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &retr= ieve_vm) =3D=3D 0, + "Failed to retrieve VM file descriptor"); + retrieved_vm_fd =3D retrieve_vm.fd; + + TEST_ASSERT(ioctl(retrieve_sess.fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &retr= ieve_gmem) =3D=3D 0, + "Failed to retrieve guest_memfd file descriptor"); + retrieved_gmem_fd =3D retrieve_gmem.fd; + + vm =3D vm_create_from_fd(retrieved_vm_fd, shape); + + u64 nr_pages =3D 2048; /* 8MB is plenty for slot0 pages */ + + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); + kvm_vm_elf_load(vm, program_invocation_name); + + for (int i =3D 0; i < NR_MEM_REGIONS; i++) + vm->memslots[i] =3D 0; + + struct userspace_mem_region *slot0 =3D memslot2region(vm, 0); + + ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size); + + vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE,= NULL, + retrieved_gmem_fd, 0); + + for (size_t i =3D 0; i < GMEM_SIZE; i +=3D page_size) + virt_pg_map(vm, gpa + i, gpa + i); + + vcpu =3D vm_vcpu_add(vm, 0, guest_code_phase2); + kvm_arch_vm_finalize_vcpus(vm); + + vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE); + + printf("Resuming / Running VM in Phase 2...\n"); + vcpu_run(vcpu); + TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE); + + printf("\nSUCCESS: Phase 2 Complete! All 5MB complex data verified intact= !\n"); + + close(retrieve_sess.fd); + close(dev_luo_fd); + /* This will also close the vm_fd */ + kvm_vm_free(vm); + close(retrieved_gmem_fd); +} + +int main(int argc, char *argv[]) +{ + bool phase2 =3D false; + + TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD)); + page_size =3D getpagesize(); + + for (int i =3D 1; i < argc; i++) { + if (strcmp(argv[i], "--phase2") =3D=3D 0) + phase2 =3D true; + } + + if (phase2) + do_phase2(); + else + do_phase1(); + + return 0; +} --=20 2.54.0.563.g4f69b47b94-goog