.../0061_message_searched_subject.py | 19 +++++++++++++++++++ .../0062_populate_searched_subject.py | 18 ++++++++++++++++++ .../0063_postgres_fts_searched_subject.py | 19 +++++++++++++++++++ api/models.py | 4 ++++ api/search.py | 2 +- 5 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 api/migrations/0061_message_searched_subject.py create mode 100644 api/migrations/0062_populate_searched_subject.py create mode 100644 api/migrations/0063_postgres_fts_searched_subject.py
Postgres full text search is pretty strict about only searching for exact word matches.
For patches it is useful to add a word break on dots and slashes, but unfortunately
it is not really possible to teach it what delimiters to use.
One possibility could be to include the text search vector directly in the database
using a trigger, but that is a bit hard to do with django, so just create a new
field that is used for the full text search.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
.../0061_message_searched_subject.py | 19 +++++++++++++++++++
.../0062_populate_searched_subject.py | 18 ++++++++++++++++++
.../0063_postgres_fts_searched_subject.py | 19 +++++++++++++++++++
api/models.py | 4 ++++
api/search.py | 2 +-
5 files changed, 61 insertions(+), 1 deletion(-)
create mode 100644 api/migrations/0061_message_searched_subject.py
create mode 100644 api/migrations/0062_populate_searched_subject.py
create mode 100644 api/migrations/0063_postgres_fts_searched_subject.py
diff --git a/api/migrations/0061_message_searched_subject.py b/api/migrations/0061_message_searched_subject.py
new file mode 100644
index 0000000..7edc67d
--- /dev/null
+++ b/api/migrations/0061_message_searched_subject.py
@@ -0,0 +1,19 @@
+# Generated by Django 3.1.14 on 2022-02-22 10:45
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+ dependencies = [
+ ('api', '0060_auto_20210106_1120'),
+ ]
+
+ operations = [
+ migrations.AddField(
+ model_name='message',
+ name='searched_subject',
+ field=models.CharField(default='', max_length=4096),
+ preserve_default=False,
+ ),
+ ]
diff --git a/api/migrations/0062_populate_searched_subject.py b/api/migrations/0062_populate_searched_subject.py
new file mode 100644
index 0000000..0f2812c
--- /dev/null
+++ b/api/migrations/0062_populate_searched_subject.py
@@ -0,0 +1,18 @@
+# -*- coding: utf-8 -*-
+from __future__ import unicode_literals
+
+from django.db import migrations
+from django.db.models import Q
+
+class Migration(migrations.Migration):
+
+ dependencies = [
+ ('api', '0061_message_searched_subject'),
+ ]
+
+ operations = [
+ migrations.RunSQL(
+ 'update api_message set searched_subject = replace(replace(subject, ".", " "), "/", " ") where subject like "%.%" or subject like "%/%"'
+ )
+
+ ]
diff --git a/api/migrations/0063_postgres_fts_searched_subject.py b/api/migrations/0063_postgres_fts_searched_subject.py
new file mode 100644
index 0000000..557bd9c
--- /dev/null
+++ b/api/migrations/0063_postgres_fts_searched_subject.py
@@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+from __future__ import unicode_literals
+
+from django.db import migrations
+
+from api.migrations import PostgresOnlyMigration
+
+class Migration(PostgresOnlyMigration):
+
+ dependencies = [
+ ('api', '0062_populate_searched_subject'),
+ ]
+
+ operations = [
+ migrations.RunSQL("create index api_message_searched_subject_gin on api_message using gin(to_tsvector('english', searched_subject::text));",
+ "drop index api_message_searched_subject_gin"),
+ migrations.RunSQL("drop index api_message_subject_gin",
+ "create index api_message_subject_gin on api_message using gin(to_tsvector('english', subject::text));"),
+ ]
diff --git a/api/models.py b/api/models.py
index 18392b1..3edb4ca 100644
--- a/api/models.py
+++ b/api/models.py
@@ -472,6 +472,9 @@ class MessageManager(models.Manager):
if "in_reply_to" not in validated_data:
msg.in_reply_to = m.get_in_reply_to() or ""
msg.stripped_subject = m.get_subject(strip_tags=True)
+ msg.searched_subject = msg.subject \
+ .replace(".", " ") \
+ .replace("/", " ")
msg.version = m.get_version()
msg.prefixes = m.get_prefixes()
if m.is_series_head():
@@ -596,6 +599,7 @@ class Message(models.Model):
last_comment_date = models.DateTimeField(db_index=True, null=True)
subject = HeaderFieldModel()
stripped_subject = HeaderFieldModel(db_index=True)
+ searched_subject = HeaderFieldModel()
version = models.PositiveSmallIntegerField(default=0)
sender = jsonfield.JSONCharField(max_length=4096, db_index=True)
recipients = jsonfield.JSONField()
diff --git a/api/search.py b/api/search.py
index 08b40c3..c3d75e5 100644
--- a/api/search.py
+++ b/api/search.py
@@ -411,7 +411,7 @@ Search text keyword in the email message. Example:
if self._last_keywords:
if connection.vendor == "postgresql":
queryset = queryset.annotate(
- subjsearch=NonNullSearchVector("subject", config="english")
+ subjsearch=NonNullSearchVector("searched_subject", config="english")
)
searchq = reduce(
lambda x, y: x & y,
--
2.34.1
_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2/22/22 12:00, Paolo Bonzini wrote: > Postgres full text search is pretty strict about only searching for exact word matches. > For patches it is useful to add a word break on dots and slashes, but unfortunately > it is not really possible to teach it what delimiters to use. > > One possibility could be to include the text search vector directly in the database > using a trigger, but that is a bit hard to do with django, so just create a new > field that is used for the full text search. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Unfortunately this broke horribly when applied to next.patchew.org. I'm not sure why, but the "UPDATE" takes an incredibly long time even if it's just for 10000 rows. Paolo > --- > .../0061_message_searched_subject.py | 19 +++++++++++++++++++ > .../0062_populate_searched_subject.py | 18 ++++++++++++++++++ > .../0063_postgres_fts_searched_subject.py | 19 +++++++++++++++++++ > api/models.py | 4 ++++ > api/search.py | 2 +- > 5 files changed, 61 insertions(+), 1 deletion(-) > create mode 100644 api/migrations/0061_message_searched_subject.py > create mode 100644 api/migrations/0062_populate_searched_subject.py > create mode 100644 api/migrations/0063_postgres_fts_searched_subject.py > > diff --git a/api/migrations/0061_message_searched_subject.py b/api/migrations/0061_message_searched_subject.py > new file mode 100644 > index 0000000..7edc67d > --- /dev/null > +++ b/api/migrations/0061_message_searched_subject.py > @@ -0,0 +1,19 @@ > +# Generated by Django 3.1.14 on 2022-02-22 10:45 > + > +from django.db import migrations, models > + > + > +class Migration(migrations.Migration): > + > + dependencies = [ > + ('api', '0060_auto_20210106_1120'), > + ] > + > + operations = [ > + migrations.AddField( > + model_name='message', > + name='searched_subject', > + field=models.CharField(default='', max_length=4096), > + preserve_default=False, > + ), > + ] > diff --git a/api/migrations/0062_populate_searched_subject.py b/api/migrations/0062_populate_searched_subject.py > new file mode 100644 > index 0000000..0f2812c > --- /dev/null > +++ b/api/migrations/0062_populate_searched_subject.py > @@ -0,0 +1,18 @@ > +# -*- coding: utf-8 -*- > +from __future__ import unicode_literals > + > +from django.db import migrations > +from django.db.models import Q > + > +class Migration(migrations.Migration): > + > + dependencies = [ > + ('api', '0061_message_searched_subject'), > + ] > + > + operations = [ > + migrations.RunSQL( > + 'update api_message set searched_subject = replace(replace(subject, ".", " "), "/", " ") where subject like "%.%" or subject like "%/%"' > + ) > + > + ] > diff --git a/api/migrations/0063_postgres_fts_searched_subject.py b/api/migrations/0063_postgres_fts_searched_subject.py > new file mode 100644 > index 0000000..557bd9c > --- /dev/null > +++ b/api/migrations/0063_postgres_fts_searched_subject.py > @@ -0,0 +1,19 @@ > +# -*- coding: utf-8 -*- > +from __future__ import unicode_literals > + > +from django.db import migrations > + > +from api.migrations import PostgresOnlyMigration > + > +class Migration(PostgresOnlyMigration): > + > + dependencies = [ > + ('api', '0062_populate_searched_subject'), > + ] > + > + operations = [ > + migrations.RunSQL("create index api_message_searched_subject_gin on api_message using gin(to_tsvector('english', searched_subject::text));", > + "drop index api_message_searched_subject_gin"), > + migrations.RunSQL("drop index api_message_subject_gin", > + "create index api_message_subject_gin on api_message using gin(to_tsvector('english', subject::text));"), > + ] > diff --git a/api/models.py b/api/models.py > index 18392b1..3edb4ca 100644 > --- a/api/models.py > +++ b/api/models.py > @@ -472,6 +472,9 @@ class MessageManager(models.Manager): > if "in_reply_to" not in validated_data: > msg.in_reply_to = m.get_in_reply_to() or "" > msg.stripped_subject = m.get_subject(strip_tags=True) > + msg.searched_subject = msg.subject \ > + .replace(".", " ") \ > + .replace("/", " ") > msg.version = m.get_version() > msg.prefixes = m.get_prefixes() > if m.is_series_head(): > @@ -596,6 +599,7 @@ class Message(models.Model): > last_comment_date = models.DateTimeField(db_index=True, null=True) > subject = HeaderFieldModel() > stripped_subject = HeaderFieldModel(db_index=True) > + searched_subject = HeaderFieldModel() > version = models.PositiveSmallIntegerField(default=0) > sender = jsonfield.JSONCharField(max_length=4096, db_index=True) > recipients = jsonfield.JSONField() > diff --git a/api/search.py b/api/search.py > index 08b40c3..c3d75e5 100644 > --- a/api/search.py > +++ b/api/search.py > @@ -411,7 +411,7 @@ Search text keyword in the email message. Example: > if self._last_keywords: > if connection.vendor == "postgresql": > queryset = queryset.annotate( > - subjsearch=NonNullSearchVector("subject", config="english") > + subjsearch=NonNullSearchVector("searched_subject", config="english") > ) > searchq = reduce( > lambda x, y: x & y, _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2022-02-22 14:57, Paolo Bonzini wrote: > On 2/22/22 12:00, Paolo Bonzini wrote: > > Postgres full text search is pretty strict about only searching for exact word matches. > > For patches it is useful to add a word break on dots and slashes, but unfortunately > > it is not really possible to teach it what delimiters to use. > > > > One possibility could be to include the text search vector directly in the database > > using a trigger, but that is a bit hard to do with django, so just create a new > > field that is used for the full text search. > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Unfortunately this broke horribly when applied to next.patchew.org. I'm not > sure why, but the "UPDATE" takes an incredibly long time even if it's just > for 10000 rows. > > Paolo Oops. Maybe the VM is overwhelmed by IO? It seems to be up now, have you aborted the change? The HTTP 500 downtime we experience for every code change is quite bad. I'm inclined we switch to a load balanced server using k3s (single node) and run as Kubernetes Deployment. It has some quite noticable memory tax (500MB to 1G) is nicer in many ways. It will take some work to chance but I can manage. What do you think? Fam _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2/22/22 15:06, Fam Zheng wrote: >> Unfortunately this broke horribly when applied to next.patchew.org. I'm not >> sure why, but the "UPDATE" takes an incredibly long time even if it's just >> for 10000 rows. > > Oops. Maybe the VM is overwhelmed by IO? It seems to be up now, have you > aborted the change? Yes, I reverted it and undid the django migration (with ALTER TABLE api_messages DROP COLUMN + DELETE FROM django_migrations WHERE ...) by hand. > The HTTP 500 downtime we experience for every code change is quite bad. I'm > inclined we switch to a load balanced server using k3s (single node) and run as > Kubernetes Deployment. It has some quite noticable memory tax (500MB to 1G) is > nicer in many ways. It will take some work to chance but I can manage. Most of the downtime is for chown -R of the mail data, is that actually necessary? But really I think we should just make the cron deployment to patchew.org for master only run once a day. That said: 1) we should set up next.patchew.org's importer to bring at least all of QEMU, so that we have some recent useful data 2) we should also set up an alternate applier for next.patchew.org, again at least for QEMU 3) if you want, I have Azure credits that were given to QEMU, and we can use them for a Patchew instance as well since it uses QEMU. I'm not sure how hard it is to set up our services on AKS, I can start with the next.patchew.org importer next Friday, as I have a day of learning... Paolo _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2022-02-22 17:22, Paolo Bonzini wrote: > On 2/22/22 15:06, Fam Zheng wrote: > > > Unfortunately this broke horribly when applied to next.patchew.org. I'm not > > > sure why, but the "UPDATE" takes an incredibly long time even if it's just > > > for 10000 rows. > > > > Oops. Maybe the VM is overwhelmed by IO? It seems to be up now, have you > > aborted the change? > > Yes, I reverted it and undid the django migration (with ALTER TABLE > api_messages DROP COLUMN + DELETE FROM django_migrations WHERE ...) by hand. > > > The HTTP 500 downtime we experience for every code change is quite bad. I'm > > inclined we switch to a load balanced server using k3s (single node) and run as > > Kubernetes Deployment. It has some quite noticable memory tax (500MB to 1G) is > > nicer in many ways. It will take some work to chance but I can manage. > > Most of the downtime is for chown -R of the mail data, is that actually > necessary? Likely redundant now. But the problem still exists for long migrations. It's not a big step to taking advantage of kubernetes from where we are. (I already have a downstream deployment done that way.) > But really I think we should just make the cron deployment to > patchew.org for master only run once a day. > > That said: > > 1) we should set up next.patchew.org's importer to bring at least all of > QEMU, so that we have some recent useful data > > 2) we should also set up an alternate applier for next.patchew.org, again at > least for QEMU I'd like to do it, but the disk space on next.patchew.org is quite tight. > > 3) if you want, I have Azure credits that were given to QEMU, and we can use > them for a Patchew instance as well since it uses QEMU. I'm not sure how > hard it is to set up our services on AKS, I can start with the > next.patchew.org importer next Friday, as I have a day of learning... AKS sounds great. I think we can first move importers and appliers there. Their Deployment YAML will be fairly straightforward; for the container registry we can use GitLab or just push as docker public image. Fam _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2/22/22 17:39, Fam Zheng wrote: >> 3) if you want, I have Azure credits that were given to QEMU, and we can use >> them for a Patchew instance as well since it uses QEMU. I'm not sure how >> hard it is to set up our services on AKS, I can start with the >> next.patchew.org importer next Friday, as I have a day of learning... > > AKS sounds great. I think we can first move importers and appliers there. > Their Deployment YAML will be fairly straightforward; for the container > registry we can use GitLab or just push as docker public image. Ok, so we'll need 2x3 docker public images for {next,master} x {importer,applier,new-importer}. I'll use either quay.io or azure container registry for now, then we can decide. Later we can add server and db as well. Paolo _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
On 2022-02-22 17:47, Paolo Bonzini wrote: > On 2/22/22 17:39, Fam Zheng wrote: > > > 3) if you want, I have Azure credits that were given to QEMU, and we can use > > > them for a Patchew instance as well since it uses QEMU. I'm not sure how > > > hard it is to set up our services on AKS, I can start with the > > > next.patchew.org importer next Friday, as I have a day of learning... > > > > AKS sounds great. I think we can first move importers and appliers there. > > Their Deployment YAML will be fairly straightforward; for the container > > registry we can use GitLab or just push as docker public image. > > Ok, so we'll need 2x3 docker public images for {next,master} x > {importer,applier,new-importer}. I'll use either quay.io or azure container > registry for now, then we can decide. Later we can add server and db as > well. No problem. Making kubelet pull from private registries is a little painful [1]. Alternatively, we can also use a public container, do git-pull in the container [2] to avoid using registry at all. This will greatly simplify the CI/CD script in the patchew repo (just a kubectl apply, no more docker-build and docker-pull). [1]: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ [2]: example YAML for importer (modified from my local version, untested) --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: importer-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: importer name: importer spec: replicas: 1 selector: matchLabels: app: importer template: metadata: labels: app: importer spec: volumes: - name: data persistentVolumeClaim: claimName: importer-data containers: - image: alpine:3.15 imagePullPolicy: IfNotPresent name: importer volumeMounts: - mountPath: "/data/patchew" name: data command: /bin/sh env: - name: PATCHEW_BRANCH value: next args: - "-c" - | set -e apk add git py3-pip git clone https://github.com/patchew-project/patchew -b $PATCHEW_BRANCH /opt/patchew cd /opt/patchew ... _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel
© 2016 - 2025 Red Hat, Inc.