Skip to content

Improve performance of STARTING WITH with insensitive collations #7038

@asfernandes

Description

@asfernandes

To process STARTING WITH with insensitive collations, it's first necessary to generate canonical bytes of the matching strings.

If the matching string is much greater than the pattern string, a time is wasted generating unneeded canonical bytes.

It's necessary to only generate canonical bytes for the initial substring with the same length of the pattern string.

In my tests with character set WIN1252 collate WIN_PTBR matching strings of length 60 and pattern string with length 1, I see performance improvement of ~30%.

With character set UTF8 collate UNICODE_CI I see performance improvement of ~50% in the same test.

Test:

execute block
as
    declare p varchar(1) character set win1252 collate win_ptbr = 'x';
    declare s varchar(60) character set win1252 collate win_ptbr = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!
execute block
as
    declare p varchar(1) character set utf8 collate unicode_ci = 'x';
    declare s varchar(60) character set utf8 collate unicode_ci = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions