[RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets

Thomas Huth posted 8 patches 4 months, 2 weeks ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, Thomas Huth <thuth@redhat.com>
There is a newer version of this series
[RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Thomas Huth 4 months, 2 weeks ago
In the pytests, we cannot use the fetch_asset() function from Avocado
anymore, so we have to provide our own implementation now instead.
Thus add such a function based on the _download_with_cache() function
from tests/vm/basevm.py for this purpose.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
index e3ed32e3de..73d80b3828 100644
--- a/tests/pytest/qemu_pytest/__init__.py
+++ b/tests/pytest/qemu_pytest/__init__.py
@@ -11,6 +11,7 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or
 # later.  See the COPYING file in the top-level directory.
 
+import hashlib
 import logging
 import os
 import shutil
@@ -201,17 +202,34 @@ def setUp(self, bin_prefix):
         self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
         self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
 
-    def fetch_asset(self, name,
-                    asset_hash, algorithm=None,
-                    locations=None, expire=None,
-                    find_only=False, cancel_on_missing=True):
-        return super().fetch_asset(name,
-                        asset_hash=asset_hash,
-                        algorithm=algorithm,
-                        locations=locations,
-                        expire=expire,
-                        find_only=find_only,
-                        cancel_on_missing=cancel_on_missing)
+    def check_hash(self, file_name, expected_hash):
+        if not expected_hash:
+            return True
+        if len(expected_hash) == 32:
+            sum_prog = 'md5sum'
+        elif len(expected_hash) == 40:
+            sum_prog = 'sha1sum'
+        elif len(expected_hash) == 64:
+            sum_prog = 'sha256sum'
+        elif len(expected_hash) == 128:
+            sum_prog = 'sha512sum'
+        else:
+            raise Exception("unknown hash type")
+        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
+        return expected_hash == checksum.decode("utf-8")
+
+    def fetch_asset(self, url, asset_hash):
+        cache_dir = os.path.expanduser("~/.cache/qemu/download")
+        if not os.path.exists(cache_dir):
+            os.makedirs(cache_dir)
+        fname = os.path.join(cache_dir,
+                             hashlib.sha1(url.encode("utf-8")).hexdigest())
+        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
+            return fname
+        logging.debug("Downloading %s to %s...", url, fname)
+        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
+        os.rename(fname + ".download", fname)
+        return fname
 
 
 class QemuSystemTest(QemuBaseTest):
-- 
2.45.2
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Daniel P. Berrangé 4 months, 2 weeks ago
On Thu, Jul 11, 2024 at 01:55:43PM +0200, Thomas Huth wrote:
> In the pytests, we cannot use the fetch_asset() function from Avocado
> anymore, so we have to provide our own implementation now instead.
> Thus add such a function based on the _download_with_cache() function
> from tests/vm/basevm.py for this purpose.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
> index e3ed32e3de..73d80b3828 100644
> --- a/tests/pytest/qemu_pytest/__init__.py
> +++ b/tests/pytest/qemu_pytest/__init__.py
> @@ -11,6 +11,7 @@
>  # This work is licensed under the terms of the GNU GPL, version 2 or
>  # later.  See the COPYING file in the top-level directory.
>  
> +import hashlib
>  import logging
>  import os
>  import shutil
> @@ -201,17 +202,34 @@ def setUp(self, bin_prefix):
>          self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
>          self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
>  
> -    def fetch_asset(self, name,
> -                    asset_hash, algorithm=None,
> -                    locations=None, expire=None,
> -                    find_only=False, cancel_on_missing=True):
> -        return super().fetch_asset(name,
> -                        asset_hash=asset_hash,
> -                        algorithm=algorithm,
> -                        locations=locations,
> -                        expire=expire,
> -                        find_only=find_only,
> -                        cancel_on_missing=cancel_on_missing)
> +    def check_hash(self, file_name, expected_hash):
> +        if not expected_hash:
> +            return True
> +        if len(expected_hash) == 32:
> +            sum_prog = 'md5sum'
> +        elif len(expected_hash) == 40:
> +            sum_prog = 'sha1sum'
> +        elif len(expected_hash) == 64:
> +            sum_prog = 'sha256sum'
> +        elif len(expected_hash) == 128:
> +            sum_prog = 'sha512sum'
> +        else:
> +            raise Exception("unknown hash type")

Why shouldn't we just standardize on sha256 as we convert each test
to pytest ? sha512 is overkill, and md5/sha1 shouldn't really be used
anymore.

> +        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
> +        return expected_hash == checksum.decode("utf-8")
> +
> +    def fetch_asset(self, url, asset_hash):
> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
> +        if not os.path.exists(cache_dir):
> +            os.makedirs(cache_dir)
> +        fname = os.path.join(cache_dir,
> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
> +            return fname
> +        logging.debug("Downloading %s to %s...", url, fname)
> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
> +        os.rename(fname + ".download", fname)
> +        return fname

To avoid a dep on an external command that may not be installed,
I think we could replace wget with native python code:

 import urllib
 from shutil import copyfileobj
 
 with urllib.request.urlopen(url) as src:
    with open(fname + ".download", "w+") as dst
       copyfileobj(src, dst)


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Thomas Huth 4 months, 2 weeks ago
On 12/07/2024 11.09, Daniel P. Berrangé wrote:
> On Thu, Jul 11, 2024 at 01:55:43PM +0200, Thomas Huth wrote:
>> In the pytests, we cannot use the fetch_asset() function from Avocado
>> anymore, so we have to provide our own implementation now instead.
>> Thus add such a function based on the _download_with_cache() function
>> from tests/vm/basevm.py for this purpose.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   tests/pytest/qemu_pytest/__init__.py | 40 ++++++++++++++++++++--------
>>   1 file changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/tests/pytest/qemu_pytest/__init__.py b/tests/pytest/qemu_pytest/__init__.py
>> index e3ed32e3de..73d80b3828 100644
>> --- a/tests/pytest/qemu_pytest/__init__.py
>> +++ b/tests/pytest/qemu_pytest/__init__.py
>> @@ -11,6 +11,7 @@
>>   # This work is licensed under the terms of the GNU GPL, version 2 or
>>   # later.  See the COPYING file in the top-level directory.
>>   
>> +import hashlib
>>   import logging
>>   import os
>>   import shutil
>> @@ -201,17 +202,34 @@ def setUp(self, bin_prefix):
>>           self.assertIsNotNone(SOURCE_DIR,'PYTEST_SOURCE_ROOT must be set')
>>           self.assertIsNotNone(self.qemu_bin, 'PYTEST_QEMU_BINARY must be set')
>>   
>> -    def fetch_asset(self, name,
>> -                    asset_hash, algorithm=None,
>> -                    locations=None, expire=None,
>> -                    find_only=False, cancel_on_missing=True):
>> -        return super().fetch_asset(name,
>> -                        asset_hash=asset_hash,
>> -                        algorithm=algorithm,
>> -                        locations=locations,
>> -                        expire=expire,
>> -                        find_only=find_only,
>> -                        cancel_on_missing=cancel_on_missing)
>> +    def check_hash(self, file_name, expected_hash):
>> +        if not expected_hash:
>> +            return True
>> +        if len(expected_hash) == 32:
>> +            sum_prog = 'md5sum'
>> +        elif len(expected_hash) == 40:
>> +            sum_prog = 'sha1sum'
>> +        elif len(expected_hash) == 64:
>> +            sum_prog = 'sha256sum'
>> +        elif len(expected_hash) == 128:
>> +            sum_prog = 'sha512sum'
>> +        else:
>> +            raise Exception("unknown hash type")
> 
> Why shouldn't we just standardize on sha256 as we convert each test
> to pytest ? sha512 is overkill, and md5/sha1 shouldn't really be used
> anymore.

I mainly added that for minimizing the changes that I need to do on the 
existing tests. Updating all the hashsums there is certainly some additional 
work... If you want to help, feel free to send patches for the existing 
avocado tests to update the md5 and sha1 sums there!

>> +        checksum = subprocess.check_output([sum_prog, file_name]).split()[0]
>> +        return expected_hash == checksum.decode("utf-8")
>> +
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> To avoid a dep on an external command that may not be installed,
> I think we could replace wget with native python code:
> 
>   import urllib
...

Yes, I came across urllib after I sent out the patches, too, sounds like the 
right way to go. We should also update tests/vm/basevm.py accordingly!

  Thomas



Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Richard Henderson 4 months, 2 weeks ago
On 7/11/24 04:55, Thomas Huth wrote:
> +    def fetch_asset(self, url, asset_hash):
> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
> +        if not os.path.exists(cache_dir):
> +            os.makedirs(cache_dir)
> +        fname = os.path.join(cache_dir,
> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
> +            return fname
> +        logging.debug("Downloading %s to %s...", url, fname)
> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
> +        os.rename(fname + ".download", fname)
> +        return fname

Download failure via exception?
Check hash on downloaded asset?


r~
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Thomas Huth 4 months, 2 weeks ago
On 11/07/2024 18.45, Richard Henderson wrote:
> On 7/11/24 04:55, Thomas Huth wrote:
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>> ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> Download failure via exception?
> Check hash on downloaded asset?

Yes, you're right, I'll add that. But that means it's also missing from 
_download_with_cache in tests/vm/basevm.py so far...

  Thomas


Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Richard Henderson 4 months, 2 weeks ago
On 7/11/24 09:45, Richard Henderson wrote:
> On 7/11/24 04:55, Thomas Huth wrote:
>> +    def fetch_asset(self, url, asset_hash):
>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>> +        if not os.path.exists(cache_dir):
>> +            os.makedirs(cache_dir)
>> +        fname = os.path.join(cache_dir,
>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>> +            return fname
>> +        logging.debug("Downloading %s to %s...", url, fname)
>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>> +        os.rename(fname + ".download", fname)
>> +        return fname
> 
> Download failure via exception?
> Check hash on downloaded asset?

I would prefer to see assets, particularly downloading, handled in a separate pass from tests.

(1) Asset download should not count against test timeout.
(2) Running tests while disconnected should skip unavailable assets.

Avocado kinda does this, but still generates errors instead of skips.


r~


Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Thomas Huth 4 months, 2 weeks ago
On 11/07/2024 20.49, Richard Henderson wrote:
> On 7/11/24 09:45, Richard Henderson wrote:
>> On 7/11/24 04:55, Thomas Huth wrote:
>>> +    def fetch_asset(self, url, asset_hash):
>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>> +        if not os.path.exists(cache_dir):
>>> +            os.makedirs(cache_dir)
>>> +        fname = os.path.join(cache_dir,
>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>> +            return fname
>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>>> ".download"])
>>> +        os.rename(fname + ".download", fname)
>>> +        return fname
>>
>> Download failure via exception?
>> Check hash on downloaded asset?
> 
> I would prefer to see assets, particularly downloading, handled in a 
> separate pass from tests.
> 
> (1) Asset download should not count against test timeout.
> (2) Running tests while disconnected should skip unavailable assets.

Sounds good, but honestly, that's above my primitive python-fu. I guess that 
would need some kind of introspection of the tests to retrieve the 
information what they want to download?

  Thomas



Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Alex Bennée 4 months, 2 weeks ago
Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/11/24 09:45, Richard Henderson wrote:
>> On 7/11/24 04:55, Thomas Huth wrote:
>>> +    def fetch_asset(self, url, asset_hash):
>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>> +        if not os.path.exists(cache_dir):
>>> +            os.makedirs(cache_dir)
>>> +        fname = os.path.join(cache_dir,
>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>> +            return fname
>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>>> +        os.rename(fname + ".download", fname)
>>> +        return fname
>> Download failure via exception?
>> Check hash on downloaded asset?
>
> I would prefer to see assets, particularly downloading, handled in a
> separate pass from tests.

And I assume cachable?

>
> (1) Asset download should not count against test timeout.
> (2) Running tests while disconnected should skip unavailable assets.
>
> Avocado kinda does this, but still generates errors instead of skips.
>
>
> r~

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Richard Henderson 4 months, 2 weeks ago
On 7/11/24 12:23, Alex Bennée wrote:
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 7/11/24 09:45, Richard Henderson wrote:
>>> On 7/11/24 04:55, Thomas Huth wrote:
>>>> +    def fetch_asset(self, url, asset_hash):
>>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>>> +        if not os.path.exists(cache_dir):
>>>> +            os.makedirs(cache_dir)
>>>> +        fname = os.path.join(cache_dir,
>>>> +                             hashlib.sha1(url.encode("utf-8")).hexdigest())
>>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>>> +            return fname
>>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
>>>> +        os.rename(fname + ".download", fname)
>>>> +        return fname
>>> Download failure via exception?
>>> Check hash on downloaded asset?
>>
>> I would prefer to see assets, particularly downloading, handled in a
>> separate pass from tests.
> 
> And I assume cachable?

The cache is already handled here.  But downloading after cache miss is non-optional, may 
not fail, and is accounted against the meson test timeout.


r~

Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
Posted by Thomas Huth 4 months, 2 weeks ago
On 11/07/2024 23.35, Richard Henderson wrote:
> On 7/11/24 12:23, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> On 7/11/24 09:45, Richard Henderson wrote:
>>>> On 7/11/24 04:55, Thomas Huth wrote:
>>>>> +    def fetch_asset(self, url, asset_hash):
>>>>> +        cache_dir = os.path.expanduser("~/.cache/qemu/download")
>>>>> +        if not os.path.exists(cache_dir):
>>>>> +            os.makedirs(cache_dir)
>>>>> +        fname = os.path.join(cache_dir,
>>>>> +                             
>>>>> hashlib.sha1(url.encode("utf-8")).hexdigest())
>>>>> +        if os.path.exists(fname) and self.check_hash(fname, asset_hash):
>>>>> +            return fname
>>>>> +        logging.debug("Downloading %s to %s...", url, fname)
>>>>> +        subprocess.check_call(["wget", "-c", url, "-O", fname + 
>>>>> ".download"])
>>>>> +        os.rename(fname + ".download", fname)
>>>>> +        return fname
>>>> Download failure via exception?
>>>> Check hash on downloaded asset?
>>>
>>> I would prefer to see assets, particularly downloading, handled in a
>>> separate pass from tests.
>>
>> And I assume cachable?
> 
> The cache is already handled here.  But downloading after cache miss is 
> non-optional, may not fail, and is accounted against the meson test timeout.

Unless someone has a really good idea how to implement that download before 
running the tests, I think we can start by simply giving enough headroom in 
the initial timeout settings. We can then hopefully easily fine-tune later - 
since this time the framework is under our control, so we don't have to sync 
in a cumbersome way with the avocado test runner any more.

  Thomas