:p
atchew
Login
I have finished the (small) amount of work needed to remove blobs completely from patchew and deployed the change to next.patchew.org. The bulk of the work can be done while the server is up, thanks to a migration script that can run within "./manage.py shell" and to a bit of live database hacking. :) next.patchew.org seems to be a lot snappier than before, and my worries about consuming more disk space (because .xz compression is removed) were unfounded. A directory with a million files is a bit heavy on the filesystem, evidently. The process to deploy these patches is as follows: - push the first patch, and run from the shell the following command (which will take a long time) from scripts import blob_migrate blob_migrate.doit() - backup what's left of the blobs directory - push the second and third patch (no more slow chown step!) - do the queries in the commit messages of the fourth patch - push the last two patches as well. I plan to do this sometime next week on patchew.org. Paolo Paolo Bonzini (5): models: start moving away from placing mboxes in the filesystem models: change ad hoc deblobbing script to migration models: confine blob handling to migrations models: remove messages for which the blobs were lost models: make mbox_bytes non-null api/blobs.py | 38 ------------------ api/migrations/0030_deblob_properties.py | 4 +- api/migrations/0061_message_mbox_bytes.py | 18 +++++++++ api/migrations/0062_deblob_messages.py | 40 +++++++++++++++++++ api/migrations/0063_remove_broken_messages.py | 19 +++++++++ api/migrations/0064_auto_20220407_0742.py | 18 +++++++++ api/migrations/__init__.py | 24 +++++++++-- api/models.py | 20 ++++------ 8 files changed, 125 insertions(+), 56 deletions(-) delete mode 100644 api/blobs.py create mode 100644 api/migrations/0061_message_mbox_bytes.py create mode 100644 api/migrations/0062_deblob_messages.py create mode 100644 api/migrations/0063_remove_broken_messages.py create mode 100644 api/migrations/0064_auto_20220407_0742.py -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
I have finished the (small) amount of work needed to remove blobs completely from patchew and deployed the change to next.patchew.org. The bulk of the work can be done while the server is up, thanks to a migration script that can run within "./manage.py shell" and to a bit of live database hacking. :) next.patchew.org seems to be a lot snappier than before, and my worries about consuming more disk space (because .xz compression is removed) were unfounded. A directory with a million files is a bit heavy on the filesystem, evidently. The process to deploy these patches is as follows: - push the first patch, and run from the shell the following command (which will take a long time) from scripts import blob_migrate blob_migrate.doit() - backup what's left of the blobs directory - push the second and third patch (no more slow chown step!) - do the queries in the commit messages of the fourth patch - push the last two patches as well. I plan to do this sometime next week on patchew.org. Paolo Paolo Bonzini (5): models: start moving away from placing mboxes in the filesystem models: change ad hoc deblobbing script to migration models: confine blob handling to migrations models: remove messages for which the blobs were lost models: make mbox_bytes non-null api/blobs.py | 38 ------------------ api/migrations/0030_deblob_properties.py | 4 +- api/migrations/0061_message_mbox_bytes.py | 18 +++++++++ api/migrations/0062_deblob_messages.py | 40 +++++++++++++++++++ api/migrations/0063_remove_broken_messages.py | 19 +++++++++ api/migrations/0064_auto_20220407_0742.py | 18 +++++++++ api/migrations/__init__.py | 24 +++++++++-- api/models.py | 20 ++++------ 8 files changed, 125 insertions(+), 56 deletions(-) delete mode 100644 api/blobs.py create mode 100644 api/migrations/0061_message_mbox_bytes.py create mode 100644 api/migrations/0062_deblob_messages.py create mode 100644 api/migrations/0063_remove_broken_messages.py create mode 100644 api/migrations/0064_auto_20220407_0742.py -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
Teach the Message class to retrieve the mbox from the database instead of going to the filesystem. An enclosed scripts can be run within "./manage.py shell" to migrate all messages from the filesystem into the database while the server is running. --- api/blobs.py | 8 -------- api/migrations/0061_message_mbox_bytes.py | 18 ++++++++++++++++ api/models.py | 24 ++++++++++++++-------- scripts/blob_migrate.py | 25 +++++++++++++++++++++++ 4 files changed, 58 insertions(+), 17 deletions(-) create mode 100644 api/migrations/0061_message_mbox_bytes.py create mode 100644 scripts/blob_migrate.py diff --git a/api/blobs.py b/api/blobs.py index XXXXXXX..XXXXXXX 100644 --- a/api/blobs.py +++ b/api/blobs.py @@ -XXX,XX +XXX,XX @@ from django.conf import settings import lzma -def save_blob(data, name=None): - if not name: - name = str(uuid.uuid4()) - fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") - lzma.open(fn, "w").write(data.encode("utf-8")) - return name - - def load_blob(name): fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") return lzma.open(fn, "r").read().decode("utf-8") diff --git a/api/migrations/0061_message_mbox_bytes.py b/api/migrations/0061_message_mbox_bytes.py new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/api/migrations/0061_message_mbox_bytes.py @@ -XXX,XX +XXX,XX @@ +# Generated by Django 3.1.14 on 2022-04-06 07:01 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('api', '0060_auto_20210106_1120'), + ] + + operations = [ + migrations.AddField( + model_name='message', + name='mbox_bytes', + field=models.BinaryField(null=True), + ), + ] diff --git a/api/models.py b/api/models.py index XXXXXXX..XXXXXXX 100644 --- a/api/models.py +++ b/api/models.py @@ -XXX,XX +XXX,XX @@ import lzma from mbox import MboxMessage, decode_payload from patchew.tags import lines_iter from event import emit_event, declare_event -from .blobs import save_blob, load_blob +from .blobs import delete_blob, load_blob import mod @@ -XXX,XX +XXX,XX @@ class Message(models.Model): is_obsolete = models.BooleanField(default=False) is_tested = models.BooleanField(default=False) is_reviewed = models.BooleanField(default=False) + mbox_bytes = models.BinaryField(null=True) # is series head if not Null topic = models.ForeignKey( @@ -XXX,XX +XXX,XX @@ class Message(models.Model): maintainers = jsonfield.JSONField(blank=True, default=[]) properties = jsonfield.JSONField(default={}) - def save_mbox(self, mbox_blob): - save_blob(mbox_blob, self.message_id) + def save_mbox(self, mbox): + mbox_bytes = mbox.encode("utf-8") + if self.mbox_bytes is None: + delete_blob(self.message_id) + self.mbox_bytes = mbox_bytes def get_mbox_obj(self): - self.get_mbox() + if not hasattr(self, "_mbox_obj"): + self._mbox_obj = MboxMessage(self.mbox) return self._mbox_obj def get_mbox(self): - if hasattr(self, "mbox_blob"): - return self.mbox_blob - self.mbox_blob = load_blob(self.message_id) - self._mbox_obj = MboxMessage(self.mbox_blob) - return self.mbox_blob + if not hasattr(self, "_mbox_decoded"): + if self.mbox_bytes: + self._mbox_decoded = str(self.mbox_bytes, "utf-8") + else: + self._mbox_decoded = load_blob(self.message_id) + return self._mbox_decoded mbox = property(get_mbox) diff --git a/scripts/blob_migrate.py b/scripts/blob_migrate.py new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/scripts/blob_migrate.py @@ -XXX,XX +XXX,XX @@ +#! /usr/bin/env python3 + +from api.models import Message, Project +from django.db import transaction + +def doit(n=1000): + done = 0 + for p in Project.objects.all(): + start = Message.objects.filter(project=p, mbox_bytes=None).order_by("-date").first() + while start: + first_date = start.date + print(done, p, first_date) + with transaction.atomic(): + previously = done + q = Message.objects.filter(project=p, date__lte=first_date, mbox_bytes=None).order_by("-date")[:n] + for msg in q: + try: + msg.save_mbox(msg.mbox) + msg.save() + done += 1 + except Exception as e: + print(msg, type(e)) + start = msg + if done == previously and start.date == first_date: + start = None -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
The deblobbing script was written to be invoked while the server is running. Change it to a migration for the sake of running it in the developer setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- .../migrations/0062_deblob_messages.py | 25 +++++++++++++++---- 1 file changed, 20 insertions(+), 5 deletions(-) rename scripts/blob_migrate.py => api/migrations/0062_deblob_messages.py (51%) diff --git a/scripts/blob_migrate.py b/api/migrations/0062_deblob_messages.py similarity index 51% rename from scripts/blob_migrate.py rename to api/migrations/0062_deblob_messages.py index XXXXXXX..XXXXXXX 100644 --- a/scripts/blob_migrate.py +++ b/api/migrations/0062_deblob_messages.py @@ -XXX,XX +XXX,XX @@ #! /usr/bin/env python3 +from __future__ import unicode_literals -from api.models import Message, Project -from django.db import transaction +from django.db import migrations, transaction -def doit(n=1000): +from api import blobs + +def deblob_messages(apps, schema_editor): + Project = apps.get_model("api", "Project") + Message = apps.get_model("api", "Message") done = 0 for p in Project.objects.all(): start = Message.objects.filter(project=p, mbox_bytes=None).order_by("-date").first() @@ -XXX,XX +XXX,XX @@ def doit(n=1000): print(done, p, first_date) with transaction.atomic(): previously = done - q = Message.objects.filter(project=p, date__lte=first_date, mbox_bytes=None).order_by("-date")[:n] + q = Message.objects.filter(project=p, date__lte=first_date, mbox_bytes=None).order_by("-date")[:1000] for msg in q: try: - msg.save_mbox(msg.mbox) + mbox_decoded = blobs.load_blob(msg.message_id) + msg.mbox_bytes = mbox_decoded.encode("utf-8") msg.save() + blobs.delete_blob(msg.message_id) done += 1 except Exception as e: print(msg, type(e)) start = msg if done == previously and start.date == first_date: start = None + +class Migration(migrations.Migration): + + dependencies = [("api", "0061_message_mbox_bytes")] + + operations = [ + migrations.RunPython(deblob_messages, reverse_code=migrations.RunPython.noop) + ] + -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
Blobs are not used anymore by api.models except for migrating historical databases. Move the api.blobs code into the api.migrations package. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- api/blobs.py | 30 ------------------------ api/migrations/0030_deblob_properties.py | 4 ++-- api/migrations/0062_deblob_messages.py | 6 ++--- api/migrations/__init__.py | 24 +++++++++++++++---- api/models.py | 16 +++---------- 5 files changed, 28 insertions(+), 52 deletions(-) delete mode 100644 api/blobs.py diff --git a/api/blobs.py b/api/blobs.py deleted file mode 100644 index XXXXXXX..XXXXXXX --- a/api/blobs.py +++ /dev/null @@ -XXX,XX +XXX,XX @@ -#!/usr/bin/env python3 -# -# Copyright 2016, 2018 Red Hat, Inc. -# -# Authors: -# Fam Zheng <famz@redhat.com> -# Paolo Bonzini <pbonzini@redhat.com> -# -# This work is licensed under the MIT License. Please see the LICENSE file or -# http://opensource.org/licenses/MIT. - - -import os -import uuid - -from django.conf import settings -import lzma - - -def load_blob(name): - fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") - return lzma.open(fn, "r").read().decode("utf-8") - - -def delete_blob(name): - fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") - try: - os.remove(fn) - except FileNotFoundError: - pass diff --git a/api/migrations/0030_deblob_properties.py b/api/migrations/0030_deblob_properties.py index XXXXXXX..XXXXXXX 100644 --- a/api/migrations/0030_deblob_properties.py +++ b/api/migrations/0030_deblob_properties.py @@ -XXX,XX +XXX,XX @@ from __future__ import unicode_literals from django.db import migrations -from api import blobs +from . import load_blob def deblob_properties(apps, schema_editor): @@ -XXX,XX +XXX,XX @@ def deblob_properties(apps, schema_editor): for obj in objects: obj.blob = False if obj.value is not None: - obj.value = blobs.load_blob(obj.value) + obj.value = load_blob(obj.value) obj.save() # We can't import the models directly as they may be a newer diff --git a/api/migrations/0062_deblob_messages.py b/api/migrations/0062_deblob_messages.py index XXXXXXX..XXXXXXX 100644 --- a/api/migrations/0062_deblob_messages.py +++ b/api/migrations/0062_deblob_messages.py @@ -XXX,XX +XXX,XX @@ from __future__ import unicode_literals from django.db import migrations, transaction -from api import blobs +from . import load_blob, delete_blob def deblob_messages(apps, schema_editor): Project = apps.get_model("api", "Project") @@ -XXX,XX +XXX,XX @@ def deblob_messages(apps, schema_editor): q = Message.objects.filter(project=p, date__lte=first_date, mbox_bytes=None).order_by("-date")[:1000] for msg in q: try: - mbox_decoded = blobs.load_blob(msg.message_id) + mbox_decoded = load_blob(msg.message_id) msg.mbox_bytes = mbox_decoded.encode("utf-8") msg.save() - blobs.delete_blob(msg.message_id) + delete_blob(msg.message_id) done += 1 except Exception as e: print(msg, type(e)) diff --git a/api/migrations/__init__.py b/api/migrations/__init__.py index XXXXXXX..XXXXXXX 100644 --- a/api/migrations/__init__.py +++ b/api/migrations/__init__.py @@ -XXX,XX +XXX,XX @@ #!/usr/bin/env python3 # -# Copyright 2018 Red Hat, Inc. +# Copyright 2016, 2018 Red Hat, Inc. # # Authors: +# Fam Zheng <famz@redhat.com> # Paolo Bonzini <pbonzini@redhat.com> # # This work is licensed under the MIT License. Please see the LICENSE file or # http://opensource.org/licenses/MIT. +from django.conf import settings from django.db import migrations import json -from api import blobs +import lzma +import os +import uuid +def load_blob(name): + fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") + return lzma.open(fn, "r").read().decode("utf-8") + + +def delete_blob(name): + fn = os.path.join(settings.DATA_DIR, "blob", name + ".xz") + try: + os.remove(fn) + except FileNotFoundError: + pass + def load_blob_json_safe(name): try: - return json.loads(blobs.load_blob(name)) + return json.loads(load_blob(name)) except Exception as e: return "Failed to load blob %s: %s" % (name, e) @@ -XXX,XX +XXX,XX @@ def get_property(model, name, **kwargs): def delete_property_blob(model, name, **kwargs): mp = get_property_raw(model, name, **kwargs) if hasattr(mp, "blob") and mp.blob: - blobs.delete_blob(mp.value) + delete_blob(mp.value) def set_property(model, name, value, **kwargs): diff --git a/api/models.py b/api/models.py index XXXXXXX..XXXXXXX 100644 --- a/api/models.py +++ b/api/models.py @@ -XXX,XX +XXX,XX @@ import lzma from mbox import MboxMessage, decode_payload from patchew.tags import lines_iter from event import emit_event, declare_event -from .blobs import delete_blob, load_blob import mod @@ -XXX,XX +XXX,XX @@ class MessageManager(models.Manager): msg.is_patch = m.is_patch() msg.patch_num = m.get_num()[0] msg.project = project - msg.save_mbox(mbox) + msg.mbox_bytes = mbox.encode("utf-8") msg.save() emit_event("MessageAdded", message=msg) self.update_series(msg) @@ -XXX,XX +XXX,XX @@ class MessageManager(models.Manager): msg.project = p if self.filter(message_id=msgid, project__name=p.name).first(): raise self.DuplicateMessageError(msgid) - msg.save_mbox(mbox) + msg.mbox_bytes = mbox.encode("utf-8") msg.save() emit_event("MessageAdded", message=msg) self.update_series(msg) @@ -XXX,XX +XXX,XX @@ class Message(models.Model): maintainers = jsonfield.JSONField(blank=True, default=[]) properties = jsonfield.JSONField(default={}) - def save_mbox(self, mbox): - mbox_bytes = mbox.encode("utf-8") - if self.mbox_bytes is None: - delete_blob(self.message_id) - self.mbox_bytes = mbox_bytes - def get_mbox_obj(self): if not hasattr(self, "_mbox_obj"): self._mbox_obj = MboxMessage(self.mbox) @@ -XXX,XX +XXX,XX @@ class Message(models.Model): def get_mbox(self): if not hasattr(self, "_mbox_decoded"): - if self.mbox_bytes: - self._mbox_decoded = str(self.mbox_bytes, "utf-8") - else: - self._mbox_decoded = load_blob(self.message_id) + self._mbox_decoded = str(self.mbox_bytes, "utf-8") return self._mbox_decoded mbox = property(get_mbox) -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
Messages that do not have an associated blob (typically because their message-id included a slash) have never worked. Remove them from the database. This migration can also be performed directly on the database while the server is running: update api_topic set latest_id = null where latest_id in (select id from api_message where mbox_bytes is null); delete from api_messageresult where message_id in (select id from api_message where mbox_bytes is null); delete from api_message where mbox_bytes is null; delete from api_topic where not exists (select * from api_message where api_message.topic_id = api_topic.id); Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- api/migrations/0063_remove_broken_messages.py | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 api/migrations/0063_remove_broken_messages.py diff --git a/api/migrations/0063_remove_broken_messages.py b/api/migrations/0063_remove_broken_messages.py new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/api/migrations/0063_remove_broken_messages.py @@ -XXX,XX +XXX,XX @@ +#! /usr/bin/env python3 +from __future__ import unicode_literals + +from django.db import migrations, transaction + +def remove_null_mbox_messages(apps, schema_editor): + Message = apps.get_model("api", "Message") + Topic = apps.get_model("api", "Topic") + Topic.objects.filter(latest__mbox_bytes=None).update(latest=None) + Message.objects.filter(mbox_bytes=None).delete() + +class Migration(migrations.Migration): + + dependencies = [("api", "0062_deblob_messages")] + + operations = [ + migrations.RunPython(remove_null_mbox_messages, reverse_code=migrations.RunPython.noop) + ] + -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
The mbox_bytes field now cannot be NULL, adjust it in the database. --- api/migrations/0064_auto_20220407_0742.py | 18 ++++++++++++++++++ api/models.py | 2 +- 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 api/migrations/0064_auto_20220407_0742.py diff --git a/api/migrations/0064_auto_20220407_0742.py b/api/migrations/0064_auto_20220407_0742.py new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/api/migrations/0064_auto_20220407_0742.py @@ -XXX,XX +XXX,XX @@ +# Generated by Django 3.1.14 on 2022-04-07 07:42 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('api', '0063_remove_broken_messages'), + ] + + operations = [ + migrations.AlterField( + model_name='message', + name='mbox_bytes', + field=models.BinaryField(), + ), + ] diff --git a/api/models.py b/api/models.py index XXXXXXX..XXXXXXX 100644 --- a/api/models.py +++ b/api/models.py @@ -XXX,XX +XXX,XX @@ class Message(models.Model): is_obsolete = models.BooleanField(default=False) is_tested = models.BooleanField(default=False) is_reviewed = models.BooleanField(default=False) - mbox_bytes = models.BinaryField(null=True) + mbox_bytes = models.BinaryField() # is series head if not Null topic = models.ForeignKey( -- 2.35.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel