[PATCH for-4.18 v2] automation: fix race condition in adl-suspend test

Marek Marczykowski-Górecki posted 1 patch 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20231031021712.407318-1-marmarek@invisiblethingslab.com
automation/scripts/qubes-x86-64.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
[PATCH for-4.18 v2] automation: fix race condition in adl-suspend test
Posted by Marek Marczykowski-Górecki 6 months ago
If system suspends too quickly, the message for the test controller to
wake up the system may be not sent to the console before suspending.
This will cause the test to timeout.

Fix this by calling sync on the console and waiting a bit after printing
the message. The test controller then resumes the system 30s after the
message, so as long as the delay + suspending takes less time it is
okay.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
This is consistent with the observation that sync_console "fixes" the
issue.

Changes in v2:
- add sync /dev/stdout too (Roger)
---
 automation/scripts/qubes-x86-64.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh
index 26131b082671..0f00bebdd8c8 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -54,11 +54,12 @@ until grep 'domU started' /var/log/xen/console/guest-domU.log; do
     sleep 1
 done
 echo \"${wait_and_wakeup}\"
+# let the above message flow to console, then suspend
+sync /dev/stdout
+sleep 5
 set -x
 echo deep > /sys/power/mem_sleep
 echo mem > /sys/power/state
-# now wait for resume
-sleep 5
 xl list
 xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
 # check if domU is still alive
-- 
2.41.0


Re: [PATCH for-4.18 v2] automation: fix race condition in adl-suspend test
Posted by Henry Wang 6 months ago
Hi Marek,

> On Oct 31, 2023, at 10:16, Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote:
> 
> If system suspends too quickly, the message for the test controller to
> wake up the system may be not sent to the console before suspending.
> This will cause the test to timeout.
> 
> Fix this by calling sync on the console and waiting a bit after printing
> the message. The test controller then resumes the system 30s after the
> message, so as long as the delay + suspending takes less time it is
> okay.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

I think now that we branched, this patch should be committed to both staging and staging-4.18.
For staging 4.18:

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

I will remove the commit moratorium for staging once OSSTest does a successful sync between
staging and master. Thanks.

Kind regards,
Henry

Re: [PATCH for-4.18 v2] automation: fix race condition in adl-suspend test
Posted by Andrew Cooper 6 months ago
On 31/10/2023 9:58 am, Henry Wang wrote:
> Hi Marek,
>
>> On Oct 31, 2023, at 10:16, Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote:
>>
>> If system suspends too quickly, the message for the test controller to
>> wake up the system may be not sent to the console before suspending.
>> This will cause the test to timeout.
>>
>> Fix this by calling sync on the console and waiting a bit after printing
>> the message. The test controller then resumes the system 30s after the
>> message, so as long as the delay + suspending takes less time it is
>> okay.
>>
>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> I think now that we branched, this patch should be committed to both staging and staging-4.18.
> For staging 4.18:
>
> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
>
> I will remove the commit moratorium for staging once OSSTest does a successful sync between
> staging and master. Thanks.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'll get this sorted now.

~Andrew