From nobody Sat Sep 14 16:45:26 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=patchew-devel-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=patchew-devel-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1645527670; cv=none; d=zohomail.com; s=zohoarc; b=dj2HbSNRc6Ig7JYyk/jxivHF9UbAYyv59si3VpNxFxc3JrvHuD1N3MA643x/eMa4XIdk2b8g5azA8eep+xjcGHlSQOaccIIHrnvdLh9K8k99wCzKot5XlIHE59z3LUmvJOdBm1Vzfl6WNfUo4QiAmBSvWU1U9TK/ifYzTOBX6eg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1645527670; h=Content-Type:Content-Transfer-Encoding:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=AMP7WgQnGipTCYDgYrwv756laC0U+hAqFEMNZwZ56jA=; b=Eo6jARJrTRVm74i3FeJVHkD4qGlwND+deY9tJ4ysKHlBq+THrjJE6mj6B1MkH6s1ixggEwkHncl4ajiVf8GVH3yhDjif5+Jzjb2Xb17bI7BM7/v/umxcqPtQtlIX1PH9zB/NW9WB1Sg7cwT5m2p5iRAGPnMxik2EFqCOtN9TQ2w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=patchew-devel-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1645527670493496.2318268660822; Tue, 22 Feb 2022 03:01:10 -0800 (PST) Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-489-QvVhgUI2NVCbF8oqUQJa9g-1; Tue, 22 Feb 2022 06:01:01 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8E4C2801AB2; Tue, 22 Feb 2022 11:01:00 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 857E52B449; Tue, 22 Feb 2022 11:01:00 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 78F004A701; Tue, 22 Feb 2022 11:01:00 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 21MB0xar019112 for ; Tue, 22 Feb 2022 06:00:59 -0500 Received: by smtp.corp.redhat.com (Postfix) id 86C7C785FD; Tue, 22 Feb 2022 11:00:59 +0000 (UTC) Received: from avogadro.lan (unknown [10.39.195.136]) by smtp.corp.redhat.com (Postfix) with ESMTP id EF421842A9 for ; Tue, 22 Feb 2022 11:00:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645527664; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=AMP7WgQnGipTCYDgYrwv756laC0U+hAqFEMNZwZ56jA=; b=GYJS1OrZytVt82vB4QMdZp7EndHm2uLOYDomMNH17iEky8UPCXlfHK1b0j/hdifpWwtRxh Zi4Due7NxpgeLXupPJAZKzJvwXRzMBXuFjhqO4+biFkzcqEQYWxC7tus9naVuvnnA21rkE O71J9kgWXtayhjtxlry+JtPiJaLBdWk= X-MC-Unique: QvVhgUI2NVCbF8oqUQJa9g-1 From: Paolo Bonzini To: patchew-devel@redhat.com Date: Tue, 22 Feb 2022 12:00:43 +0100 Message-Id: <20220222110044.173882-1-pbonzini@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-loop: patchew-devel@redhat.com Subject: [Patchew-devel] [PATCH] search: treat dot and slash as word separators X-BeenThere: patchew-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Patchew development and discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: patchew-devel-bounces@redhat.com Errors-To: patchew-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=patchew-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1645527672020100001 Content-Type: text/plain; charset="utf-8" Postgres full text search is pretty strict about only searching for exact w= ord matches. For patches it is useful to add a word break on dots and slashes, but unfor= tunately it is not really possible to teach it what delimiters to use. One possibility could be to include the text search vector directly in the = database using a trigger, but that is a bit hard to do with django, so just create a= new field that is used for the full text search. Signed-off-by: Paolo Bonzini --- .../0061_message_searched_subject.py | 19 +++++++++++++++++++ .../0062_populate_searched_subject.py | 18 ++++++++++++++++++ .../0063_postgres_fts_searched_subject.py | 19 +++++++++++++++++++ api/models.py | 4 ++++ api/search.py | 2 +- 5 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 api/migrations/0061_message_searched_subject.py create mode 100644 api/migrations/0062_populate_searched_subject.py create mode 100644 api/migrations/0063_postgres_fts_searched_subject.py diff --git a/api/migrations/0061_message_searched_subject.py b/api/migratio= ns/0061_message_searched_subject.py new file mode 100644 index 0000000..7edc67d --- /dev/null +++ b/api/migrations/0061_message_searched_subject.py @@ -0,0 +1,19 @@ +# Generated by Django 3.1.14 on 2022-02-22 10:45 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies =3D [ + ('api', '0060_auto_20210106_1120'), + ] + + operations =3D [ + migrations.AddField( + model_name=3D'message', + name=3D'searched_subject', + field=3Dmodels.CharField(default=3D'', max_length=3D4096), + preserve_default=3DFalse, + ), + ] diff --git a/api/migrations/0062_populate_searched_subject.py b/api/migrati= ons/0062_populate_searched_subject.py new file mode 100644 index 0000000..0f2812c --- /dev/null +++ b/api/migrations/0062_populate_searched_subject.py @@ -0,0 +1,18 @@ +# -*- coding: utf-8 -*- +from __future__ import unicode_literals + +from django.db import migrations +from django.db.models import Q + +class Migration(migrations.Migration): + + dependencies =3D [ + ('api', '0061_message_searched_subject'), + ] + + operations =3D [ + migrations.RunSQL( + 'update api_message set searched_subject =3D replace(replace(s= ubject, ".", " "), "/", " ") where subject like "%.%" or subject like "%/%"' + ) + + ] diff --git a/api/migrations/0063_postgres_fts_searched_subject.py b/api/mig= rations/0063_postgres_fts_searched_subject.py new file mode 100644 index 0000000..557bd9c --- /dev/null +++ b/api/migrations/0063_postgres_fts_searched_subject.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +from __future__ import unicode_literals + +from django.db import migrations + +from api.migrations import PostgresOnlyMigration + +class Migration(PostgresOnlyMigration): + + dependencies =3D [ + ('api', '0062_populate_searched_subject'), + ] + + operations =3D [ + migrations.RunSQL("create index api_message_searched_subject_gin o= n api_message using gin(to_tsvector('english', searched_subject::text));", + "drop index api_message_searched_subject_gin"), + migrations.RunSQL("drop index api_message_subject_gin", + "create index api_message_subject_gin on api_mes= sage using gin(to_tsvector('english', subject::text));"), + ] diff --git a/api/models.py b/api/models.py index 18392b1..3edb4ca 100644 --- a/api/models.py +++ b/api/models.py @@ -472,6 +472,9 @@ class MessageManager(models.Manager): if "in_reply_to" not in validated_data: msg.in_reply_to =3D m.get_in_reply_to() or "" msg.stripped_subject =3D m.get_subject(strip_tags=3DTrue) + msg.searched_subject =3D msg.subject \ + .replace(".", " ") \ + .replace("/", " ") msg.version =3D m.get_version() msg.prefixes =3D m.get_prefixes() if m.is_series_head(): @@ -596,6 +599,7 @@ class Message(models.Model): last_comment_date =3D models.DateTimeField(db_index=3DTrue, null=3DTru= e) subject =3D HeaderFieldModel() stripped_subject =3D HeaderFieldModel(db_index=3DTrue) + searched_subject =3D HeaderFieldModel() version =3D models.PositiveSmallIntegerField(default=3D0) sender =3D jsonfield.JSONCharField(max_length=3D4096, db_index=3DTrue) recipients =3D jsonfield.JSONField() diff --git a/api/search.py b/api/search.py index 08b40c3..c3d75e5 100644 --- a/api/search.py +++ b/api/search.py @@ -411,7 +411,7 @@ Search text keyword in the email message. Example: if self._last_keywords: if connection.vendor =3D=3D "postgresql": queryset =3D queryset.annotate( - subjsearch=3DNonNullSearchVector("subject", config=3D"= english") + subjsearch=3DNonNullSearchVector("searched_subject", c= onfig=3D"english") ) searchq =3D reduce( lambda x, y: x & y, --=20 2.34.1 _______________________________________________ Patchew-devel mailing list Patchew-devel@redhat.com https://listman.redhat.com/mailman/listinfo/patchew-devel