Improve performance of STARTING WITH with insensitive collations

To process `STARTING WITH` with insensitive collations, it's first necessary to generate canonical bytes of the matching strings.

If the matching string is much greater than the pattern string, a time is wasted generating unneeded canonical bytes.

It's necessary to only generate canonical bytes for the initial substring with the same length of the pattern string.

In my tests with `character set WIN1252 collate WIN_PTBR` matching strings of length 60 and pattern string with length 1, I see performance improvement of ~30%.

With `character set UTF8 collate UNICODE_CI` I see performance improvement of ~50% in the same test.

Test:

```
execute block
as
    declare p varchar(1) character set win1252 collate win_ptbr = 'x';
    declare s varchar(60) character set win1252 collate win_ptbr = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!
```

```
execute block
as
    declare p varchar(1) character set utf8 collate unicode_ci = 'x';
    declare s varchar(60) character set utf8 collate unicode_ci = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of STARTING WITH with insensitive collations #7038

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve performance of STARTING WITH with insensitive collations #7038

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions