From nobody Tue Dec 2 00:03:19 2025 Received: from fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com [3.64.237.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E93A921A421; Tue, 25 Nov 2025 15:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=3.64.237.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764085149; cv=none; b=Xgi/tGXFs35I7IUiiOa5Gdx3phehILmTJ1ZzrxUbz8/NKjPpfyj4f29Ib0pR+HxwjayJ6cDt9ae9NmK1VaHeDPtn1q8XnclEwxRgqoCwNVk0Fove+tdPTmQSZ39QrBl6B/6tTvT7sGB4cTdCE3MtSRQeo3moUDvuB6FIzjOegrI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764085149; c=relaxed/simple; bh=F3wVitqTdhkt4K5GbPbcMWH0oBFSL/Dt1VzigSgc7uY=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=VCESEnSwEYH4inH3UyMC2Ye3imDs1dBGdfgzFeeCfJIsHHyfhUiKXTxjUEBidn1CVy3uJTRW7ATd9EI8J18ZyY/PJauZ6idjhCaA8T43IdGSU/fdATfd77Q4g9oYlVZuOfchFY6ZoFvEQqbV441h+9JFwI3eGDXigkwX5FZZbBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.es; spf=pass smtp.mailfrom=amazon.es; dkim=pass (2048-bit key) header.d=amazon.es header.i=@amazon.es header.b=Ae8fRmK/; arc=none smtp.client-ip=3.64.237.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.es Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.es Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.es header.i=@amazon.es header.b="Ae8fRmK/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.es; i=@amazon.es; q=dns/txt; s=amazoncorp2; t=1764085147; x=1795621147; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=13eTutcxh9lZaMEOjzHKKLA6BF1+kkJovExzbTmDEmA=; b=Ae8fRmK/vP/xPAEYs+08LX/WUkUPPLdKmLrC5xBRoZ3ApWRtPbE1L67q nyyPNoKIKu6i3rIPnaj9wraT9IBU+vX83sXM9nT95JARA8Gk/KKthCudE yX2klEvLDPfrbsxe84V/TpBsJAOlDfwQqw6IXOGeNodIous77A2h5qxb+ O2+0Yore0GwPG5nRdUq8NOMeUgdn1bvgAhYkB4N+5yPudhrEPX9M5lfoU hHzrWEiulWZ38TwlCnDawhhSRm6U5OMUucVWx51+ZOgpwhh0ZJVczVsJh axM3r2BeJXz4VBBZlbdFVC6yspnpij4VpB7sDXDcDsnykRHaaCyWA4rnY g==; X-CSE-ConnectionGUID: vo9qeRnqT52s5MS9Gl4hUQ== X-CSE-MsgGUID: 0h8hazqqQQemVZMF9P0xQw== X-IronPort-AV: E=Sophos;i="6.20,225,1758585600"; d="scan'208";a="5683316" Received: from ip-10-6-11-83.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.11.83]) by internal-fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Nov 2025 15:38:48 +0000 Received: from EX19MTAEUC002.ant.amazon.com [54.240.197.228:18574] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.21.117:2525] with esmtp (Farcaster) id 9d7503b1-dc14-4182-af76-253cba4c6bba; Tue, 25 Nov 2025 15:38:48 +0000 (UTC) X-Farcaster-Flow-ID: 9d7503b1-dc14-4182-af76-253cba4c6bba Received: from EX19D012EUA003.ant.amazon.com (10.252.50.98) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Tue, 25 Nov 2025 15:38:43 +0000 Received: from EX19D012EUA001.ant.amazon.com (10.252.50.122) by EX19D012EUA003.ant.amazon.com (10.252.50.98) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Tue, 25 Nov 2025 15:38:42 +0000 Received: from EX19D012EUA001.ant.amazon.com ([fe80::b7ea:84f7:2c4b:2719]) by EX19D012EUA001.ant.amazon.com ([fe80::b7ea:84f7:2c4b:2719%3]) with mapi id 15.02.2562.029; Tue, 25 Nov 2025 15:38:42 +0000 From: "Chalios, Babis" To: "richardcochran@gmail.com" , "dwmw2@infradead.org" , "andrew+netdev@lunn.ch" , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "Chalios, Babis" , "Graf (AWS), Alexander" , "mzxreary@0pointer.de" Subject: [RFC PATCH 1/2] ptp: vmclock: add vm generation counter Thread-Topic: [RFC PATCH 1/2] ptp: vmclock: add vm generation counter Thread-Index: AQHcXiGULOjT2pJ3DUOuKuUixiRsuQ== Date: Tue, 25 Nov 2025 15:38:42 +0000 Message-ID: <20251125153830.11487-2-bchalios@amazon.es> References: <20251125153830.11487-1-bchalios@amazon.es> In-Reply-To: <20251125153830.11487-1-bchalios@amazon.es> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Similar to live migration, loading a VM from some saved state (aka snapshot) is also an event that calls for clock adjustments in the guest. However, guests might want to take more actions as a response to such events, e.g. as discarding UUIDs, resetting network connections, reseeding entropy pools, etc. These are actions that guests don't typically take during live migration, so add a new field in the vmclock_abi called vm_generation_counter which informs the guest about such events. Signed-off-by: Babis Chalios --- include/uapi/linux/vmclock-abi.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-= abi.h index 2d99b29ac44a..fbf1c5928273 100644 --- a/include/uapi/linux/vmclock-abi.h +++ b/include/uapi/linux/vmclock-abi.h @@ -115,6 +115,12 @@ struct vmclock_abi { * bit again after the update, using the about-to-be-valid fields. */ #define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7) + /* + * If the VM_GEN_COUNTER_PRESENT flag is set, the hypervisor will + * bump the vm_generation_counter field every time the guest is + * loaded from some save state (restored from a snapshot). + */ +#define VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT (1 << 8) =20 __u8 pad[2]; __u8 clock_status; @@ -177,6 +183,19 @@ struct vmclock_abi { __le64 time_frac_sec; /* Units of 1/2^64 of a second */ __le64 time_esterror_nanosec; __le64 time_maxerror_nanosec; + + /* + * This field changes to another non-repeating value when the VM + * is loaded from a snapshot. This event, typically, represents a + * "jump" forward in time. As a result, in this case as well, the + * guest needs to discard any calibrarion against external sources. + * Loading a snapshot in a VM has different semantics than other VM + * events such as live migration, i.e. apart from re-adjusting guest + * clocks a guest user space might want to discard UUIDs, reset + * network connections or reseed entropy, etc. As a result, we + * use a dedicated marker for such events. + */ + __le64 vm_generation_counter; }; =20 #endif /* __VMCLOCK_ABI_H__ */ --=20 2.34.1 From nobody Tue Dec 2 00:03:19 2025 Received: from fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com [18.199.210.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1971329370; Tue, 25 Nov 2025 15:39:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.199.210.3 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764085156; cv=none; b=Cdih202nIC8r9sCfd+2EdodqFKdwb0oA8p1K11OXmjIis6bqRB9gi/Z28eYBEqQpJv6zd9EL2I24qlmELsf8CXurph/V3u/Vk6qhcUSkuLiy8o7VxK20hMrhN+w1bBO94KDuKzVQMHLiuiTXcH2ZwZDiyxkBXLESjsmnC/n+WHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764085156; c=relaxed/simple; bh=XUMxd1sgN6nRPq7zZQF0oIRVVPGBqDqqNQDplmVFMpY=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=Yga6VYF3P0l7VGUcIvYNuOSNRptmXYVISLS9vni0OCkjimaifCpL6d2ek3wM9FzCpCr/YBny6aTetEKnA6j6hdMFfxb+YUFYgseKHpYZHjgQ3pCKl8ArxpWTsAGpQMSnQDx+v+r4qknG7YC4ZotyGae3F128Ci2tY1L3RoAA7ak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.es; spf=pass smtp.mailfrom=amazon.es; dkim=pass (2048-bit key) header.d=amazon.es header.i=@amazon.es header.b=HNMEdYCB; arc=none smtp.client-ip=18.199.210.3 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.es Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.es Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.es header.i=@amazon.es header.b="HNMEdYCB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.es; i=@amazon.es; q=dns/txt; s=amazoncorp2; t=1764085154; x=1795621154; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=XUMxd1sgN6nRPq7zZQF0oIRVVPGBqDqqNQDplmVFMpY=; b=HNMEdYCB5/+1/tBA6eN6PSWElFBZycUrFeo3s9DH9gfGyjATZtlb4h6U h6CNJhZcpz2w3LN2zOqRxIrNJSyCa9sASnGSujl+kC4aF6Y6KNQwYKclf r8ZgPTsjE+6aR9zXIxNJxArpYrpMBzcJpw7WTHiNUU7hIAI9i2g74fJCg T+b2tAtU4YxO8Y1sOKKaQyzb+BDhJmF3FxuEkXJoNXzi9L5qp7rJiSq+/ H3n16eEFjp0EGLyiCaLnPRvwj97VnSrj5heLUKVr1f8qXoV20E5s3wSrn glJVcDpKumYgOzi6Zn1WDu4Cy9gYI8yu97G1Cj6PwRpL2HcTYgvbW3MJp A==; X-CSE-ConnectionGUID: tsrPnAToS66GUNVK74whlg== X-CSE-MsgGUID: CivnpwQ9QQqA2KZptqGrAA== X-IronPort-AV: E=Sophos;i="6.20,225,1758585600"; d="scan'208";a="5674710" Received: from ip-10-6-3-216.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.3.216]) by internal-fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Nov 2025 15:38:54 +0000 Received: from EX19MTAEUA002.ant.amazon.com [54.240.197.232:31634] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.44.247:2525] with esmtp (Farcaster) id 0a0762f7-c031-4a41-9d20-cdfa9e286805; Tue, 25 Nov 2025 15:38:54 +0000 (UTC) X-Farcaster-Flow-ID: 0a0762f7-c031-4a41-9d20-cdfa9e286805 Received: from EX19D012EUA002.ant.amazon.com (10.252.50.32) by EX19MTAEUA002.ant.amazon.com (10.252.50.126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Tue, 25 Nov 2025 15:38:54 +0000 Received: from EX19D012EUA001.ant.amazon.com (10.252.50.122) by EX19D012EUA002.ant.amazon.com (10.252.50.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Tue, 25 Nov 2025 15:38:53 +0000 Received: from EX19D012EUA001.ant.amazon.com ([fe80::b7ea:84f7:2c4b:2719]) by EX19D012EUA001.ant.amazon.com ([fe80::b7ea:84f7:2c4b:2719%3]) with mapi id 15.02.2562.029; Tue, 25 Nov 2025 15:38:53 +0000 From: "Chalios, Babis" To: "richardcochran@gmail.com" , "dwmw2@infradead.org" , "andrew+netdev@lunn.ch" , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "Chalios, Babis" , "Graf (AWS), Alexander" , "mzxreary@0pointer.de" Subject: [RFC PATCH 2/2] ptp: vmclock: support device notifications Thread-Topic: [RFC PATCH 2/2] ptp: vmclock: support device notifications Thread-Index: AQHcXiGaXBXgjv2OV0ypWqKuhz60Pw== Date: Tue, 25 Nov 2025 15:38:53 +0000 Message-ID: <20251125153830.11487-3-bchalios@amazon.es> References: <20251125153830.11487-1-bchalios@amazon.es> In-Reply-To: <20251125153830.11487-1-bchalios@amazon.es> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="utf-8" Content-ID: <61206EE014051B41BE587496E77C2D79@amazon.com> Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 VMClock now expects the hypervisor to send a device notification every time the seqcount lock changes to a new (even) value. Moreover, add support for poll() in VMClock as a means to propagate this notification to user space. poll() will notify listeners every time seq_count has changed to a new (even) value since the last time read() (or open()) was called on the device. This means that when poll() returns a (POLLIN) event, listeners need to use read() to observe what has changed and update the reader's view of seq_count. In other words, after a poll() returned all subsequent calls to poll() will immediately return with a POLLIN event until the listener calls read(). Signed-off-by: Babis Chalios --- drivers/ptp/ptp_vmclock.c | 85 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 80 insertions(+), 5 deletions(-) diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c index b3a83b03d9c1..efcdcc5c40cf 100644 --- a/drivers/ptp/ptp_vmclock.c +++ b/drivers/ptp/ptp_vmclock.c @@ -5,6 +5,9 @@ * Copyright =C2=A9 2024 Amazon.com, Inc. or its affiliates. */ =20 +#include "linux/poll.h" +#include "linux/types.h" +#include "linux/wait.h" #include #include #include @@ -39,6 +42,7 @@ struct vmclock_state { struct resource res; struct vmclock_abi *clk; struct miscdevice miscdev; + wait_queue_head_t disrupt_wait; struct ptp_clock_info ptp_clock_info; struct ptp_clock *ptp_clock; enum clocksource_ids cs_id, sys_cs_id; @@ -357,10 +361,15 @@ static struct ptp_clock *vmclock_ptp_register(struct = device *dev, return ptp_clock_register(&st->ptp_clock_info, dev); } =20 +struct vmclock_file_state { + struct vmclock_state *st; + uint32_t seq; +}; + static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vm= a) { - struct vmclock_state *st =3D container_of(fp->private_data, - struct vmclock_state, miscdev); + struct vmclock_file_state *fst =3D fp->private_data; + struct vmclock_state *st =3D fst->st; =20 if ((vma->vm_flags & (VM_READ|VM_WRITE)) !=3D VM_READ) return -EROFS; @@ -379,8 +388,9 @@ static int vmclock_miscdev_mmap(struct file *fp, struct= vm_area_struct *vma) static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf, size_t count, loff_t *ppos) { - struct vmclock_state *st =3D container_of(fp->private_data, - struct vmclock_state, miscdev); + struct vmclock_file_state *fst =3D fp->private_data; + struct vmclock_state *st =3D fst->st; + ktime_t deadline =3D ktime_add(ktime_get(), VMCLOCK_MAX_WAIT); size_t max_count; uint32_t seq; @@ -402,8 +412,10 @@ static ssize_t vmclock_miscdev_read(struct file *fp, c= har __user *buf, =20 /* Pairs with hypervisor wmb */ virt_rmb(); - if (seq =3D=3D le32_to_cpu(st->clk->seq_count)) + if (seq =3D=3D le32_to_cpu(st->clk->seq_count)) { + fst->seq =3D seq; break; + } =20 if (ktime_after(ktime_get(), deadline)) return -ETIMEDOUT; @@ -413,10 +425,51 @@ static ssize_t vmclock_miscdev_read(struct file *fp, = char __user *buf, return count; } =20 +static __poll_t vmclock_miscdev_poll(struct file *fp, poll_table *wait) +{ + struct vmclock_file_state *fst =3D fp->private_data; + struct vmclock_state *st =3D fst->st; + uint32_t seq; + + poll_wait(fp, &st->disrupt_wait, wait); + + seq =3D le32_to_cpu(st->clk->seq_count); + if (fst->seq !=3D seq) + return POLLIN | POLLRDNORM; + + return 0; +} + +static int vmclock_miscdev_open(struct inode *inode, struct file *fp) +{ + struct vmclock_state *st =3D container_of(fp->private_data, + struct vmclock_state, miscdev); + struct vmclock_file_state *fst =3D kzalloc(sizeof(*fst), GFP_KERNEL); + + if (!fst) + return -ENOMEM; + + fst->st =3D st; + fst->seq =3D le32_to_cpu(st->clk->seq_count); + + fp->private_data =3D fst; + + return 0; +} + +static int vmclock_miscdev_release(struct inode *inode, struct file *fp) +{ + kfree(fp->private_data); + return 0; +} + static const struct file_operations vmclock_miscdev_fops =3D { .owner =3D THIS_MODULE, + .open =3D vmclock_miscdev_open, + .release =3D vmclock_miscdev_release, .mmap =3D vmclock_miscdev_mmap, .read =3D vmclock_miscdev_read, + .poll =3D vmclock_miscdev_poll, }; =20 /* module operations */ @@ -459,6 +512,16 @@ static acpi_status vmclock_acpi_resources(struct acpi_= resource *ares, void *data return AE_ERROR; } =20 +static void +vmclock_acpi_notification_handler(acpi_handle __always_unused handle, + u32 __always_unused event, void *dev) +{ + struct device *device =3D dev; + struct vmclock_state *st =3D device->driver_data; + + wake_up_interruptible(&st->disrupt_wait); +} + static int vmclock_probe_acpi(struct device *dev, struct vmclock_state *st) { struct acpi_device *adev =3D ACPI_COMPANION(dev); @@ -479,6 +542,14 @@ static int vmclock_probe_acpi(struct device *dev, stru= ct vmclock_state *st) return -ENODEV; } =20 + status =3D acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, + vmclock_acpi_notification_handler, + dev); + if (ACPI_FAILURE(status)) { + dev_err(dev, "failed to install notification handler"); + return -ENODEV; + } + return 0; } =20 @@ -549,6 +620,8 @@ static int vmclock_probe(struct platform_device *pdev) if (ret) return ret; =20 + init_waitqueue_head(&st->disrupt_wait); + /* * If the structure is big enough, it can be mapped to userspace. * Theoretically a guest OS even using larger pages could still @@ -581,6 +654,8 @@ static int vmclock_probe(struct platform_device *pdev) return -ENODEV; } =20 + dev->driver_data =3D st; + dev_info(dev, "%s: registered %s%s%s\n", st->name, st->miscdev.minor ? "miscdev" : "", (st->miscdev.minor && st->ptp_clock) ? ", " : "", --=20 2.34.1