By Evan Carroll

2017-11-28 16:18:12 8 Comments

ICU Specifies different LDML Collation Settings. Some of them seem pretty interesting, especially the ones on case and accent,

  • “Ignore accents”: strength=primary
  • “Ignore accents” but take case into account: strength=primary caseLevel=on
  • “Ignore case”: strength=secondary
  • “Ignore punctuation” (completely): strength=tertiary alternate=shifted
  • “Ignore punctuation” but distinguish among punctuation marks: strength=quaternary alternate=shifted potentially a better method of doing what

You can also see these documented here. Are these ICU options and settings possible with PostgreSQL 10 ICU collation support?

CREATE COLLATION special (provider = icu, locale = '[email protected]=primary');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false

I've also tried the CLDR BCP47

Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("[email protected]=upper") or in language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. See the LDML Collation spec, Collation Settings, and the data file listing the valid collation keywords and their values. (The deprecated attributes kh/colHiraganaQuaternary and vt/variableTop are not supported.)

For that, this looked right

CREATE COLLATION special (provider = icu, locale = 'en-ks-level1');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false

Also tried en-u-ks-level1 That method seems to be what the docs go for,

CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');


@Daniel Vérité 2017-12-01 14:53:23

Case insensitive or accent-insensitive collations cannot be used prior to PostgreSQL 12, because internally PostgreSQL considers that strings with a different binary representation are not equal. When the collation-aware comparator says they are equal, it uses the non-collation-aware strcmp() function as a tie-breaker, to obtain what Unicode calls a "deterministic" comparison without normalization.

Starting with PostgreSQL 12, collations have a deterministic property, which must be set to false to benefit from equality of non-binary equal strings. From CREATE COLLATION:


Specifies whether the collation should use deterministic comparisons. The default is true. A deterministic comparison considers strings that are not byte-wise equal to be unequal even if they are considered logically equal by the comparison. PostgreSQL breaks ties using a byte-wise comparison. Comparison that is not deterministic can make the collation be, say, case- or accent-insensitive. For that, you need to choose an appropriate LC_COLLATE setting and set the collation to not deterministic here.

Nondeterministic collations are only supported with the ICU provider.

The exact query in the question is not possible with nondeterministic collations because they don't support LIKE or any form of pattern matching (as of PostgreSQL 12). Also the locale should have a -u- before the collation subtags, otherwise they're going to be silently ignored by the collator.

What does work as intended:

     provider = icu, locale = 'en-u-ks-level1', deterministic=false

=# SELECT 'Å' = 'A' COLLATE "special";

@a_horse_with_no_name 2019-10-04 10:12:33

I wonder if this is possible now with Postgres 12

@Daniel Vérité 2019-10-04 12:23:05

@a_horse_with_no_name: answer rewritten for Postgres 12.

@a_horse_with_no_name 2019-10-04 12:29:38

Ah, great. Thanks. Does not seem to work on Windows though. This example returns on nothing when run on Windows

@Daniel Vérité 2019-10-04 14:18:49

@a_horse_with_no_name: a crash is also reported so there seems to be something funky with ICU and Pg 12.0 on Windows. I guess you used the EDB build?

@a_horse_with_no_name 2019-10-04 14:22:20

Yes, I use the EDB build

Related Questions

Sponsored Content

2 Answered Questions

[SOLVED] Does PostgreSQL support online schema modification (DDL)?

1 Answered Questions

1 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] Does PostgreSQL PDPG support Yakkety (Ubuntu 16.10)?

1 Answered Questions

[SOLVED] Does PostgreSQL support multi-threaded replication?

1 Answered Questions

[SOLVED] Unicode support of PostgreSQL and its performance

Sponsored Content