By Lloyd Powell


2019-07-08 14:54:21 8 Comments

I'm using pg_trgm to run a query, but I'm getting a lot of 1 matches when searching with a symbol difference. I have the following query:

SELECT my_column, similarity('$ Hello', my_column) AS sml
FROM my_table
WHERE my_column % '$ Hello'
ORDER BY sml DESC, my_column;

In my_table, I have the following:

- Hello
? Hello
| Hello
$ Hello
! Hello
!? Hello

They all return with a similarity match of 1. Do I need to escape the '$' or something similar?

3 comments

@Erwin Brandstetter 2019-07-08 23:49:19

Other answers clarified that trigram similarity is based on alphanumeric characters only. That's why all your examples match 100 %.

You may still be able to make use of a trigram GiST or GIN index and establish your desired sort order with additional ORDER BY expressions. For your demonstrated case:

SELECT my_column, similarity('$ Hello', my_column) AS sml
FROM   my_table
WHERE  my_column % '$ Hello'
ORDER  BY sml DESC
        , my_column <> '$ Hello'       -- !
        , my_column;

The boolean expression my_column <> '$ Hello' evaluates to FALSE, TRUE, or NULL - with this sort order. So the exact match (considering all characters) comes first. And this query can still use a trigram index. A trigram GiST index would (still) even support a "nearest neighbor" (KNN) search with LIMIT. Related:

You can do more, depends on your exact use case and requirements. Example:

...
ORDER  BY sml DESC
        , my_column <> '$ Hello'
        , my_column !~ '\$ Hello'            -- note $ escaped with \$
        , levenshtein(my_column , '$ Hello')
        , my_column;

Within the same trigram similarity, this sorts exact matches first, then strings containing the exact phrase. And within each subgroup matches the shorter Levenshtein distance first. Alphabetical as final tiebreaker. Related:

Last but not least, you tagged full text search. But your example is based on trigram similarity (provided by the additional module pg_trgm), which is a largely different concept with completely separate infrastructure and operators. You may want actual full text search instead (and phrase search with that?):

But punctuation characters are considered noise and stripped in FTS all the same. Same "problem". ts_debug() shows how your text search configuration classifies identified tokens in a given string (simple configuration in the example).

SELECT * FROM ts_debug('simple', '? Hello %&/( 123');

It starts with the "default" parser (currently the only one), parsing all these as noise to begin with ...

@Lloyd Powell 2019-07-09 08:45:50

What a fantastic answer with a variety of solutions to the problem I was facing. Thank you for your time! Most appreciated!

@jjanes 2019-07-08 16:01:22

Ignoring punctuation is how pg_trgm works, as documented.

If you are feeling adventurous, you can remove the line #define KEEPONLYALNUM from "contrib/pg_trgm/trgm.h", recompile and reinstall.

This will silently corrupt any existing pg_trgm indexes you have, you will need to reindex them.

@Laurenz Albe 2019-07-08 15:15:49

SELECT show_trgm('$ Hello');

            show_trgm            
---------------------------------
 {"  h"," he",ell,hel,llo,"lo "}
(1 row)

SELECT show_trgm('- Hello');

            show_trgm            
---------------------------------
 {"  h"," he",ell,hel,llo,"lo "}
(1 row)

All non-alphanumeric characters are stripped from a string before the trigrams are calculated.

That's why the strings have similarity 1.

You cannot use trigram indexes for searching for symbols.

The documentation has it in its first sentence (emphasis mine):

The pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching

Related Questions

Sponsored Content

2 Answered Questions

1 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] Fulltext match() against() returning empty set

1 Answered Questions

[SOLVED] Return matches that have at least one message

  • 2017-11-01 03:06:40
  • Arya
  • 37 View
  • 0 Score
  • 1 Answer
  • Tags:   postgresql

1 Answered Questions

1 Answered Questions

[SOLVED] SQL MATCH AGAINST incomplete string with space

1 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] SQL Server 2005: Full-text search, order by matches

Sponsored Content