To process STARTING WITH with insensitive collations, it's first necessary to generate canonical bytes of the matching strings.
If the matching string is much greater than the pattern string, a time is wasted generating unneeded canonical bytes.
It's necessary to only generate canonical bytes for the initial substring with the same length of the pattern string.
In my tests with character set WIN1252 collate WIN_PTBR matching strings of length 60 and pattern string with length 1, I see performance improvement of ~30%.
With character set UTF8 collate UNICODE_CI I see performance improvement of ~50% in the same test.
Test:
execute block
as
declare p varchar(1) character set win1252 collate win_ptbr = 'x';
declare s varchar(60) character set win1252 collate win_ptbr = 'x12345678901234567890123456789012345678901234567890123456789';
declare n integer = 0;
declare b boolean;
begin
while (n < 1000000)
do
begin
b = s starting with p;
n = n + 1;
end
end!
execute block
as
declare p varchar(1) character set utf8 collate unicode_ci = 'x';
declare s varchar(60) character set utf8 collate unicode_ci = 'x12345678901234567890123456789012345678901234567890123456789';
declare n integer = 0;
declare b boolean;
begin
while (n < 1000000)
do
begin
b = s starting with p;
n = n + 1;
end
end!
To process
STARTING WITHwith insensitive collations, it's first necessary to generate canonical bytes of the matching strings.If the matching string is much greater than the pattern string, a time is wasted generating unneeded canonical bytes.
It's necessary to only generate canonical bytes for the initial substring with the same length of the pattern string.
In my tests with
character set WIN1252 collate WIN_PTBRmatching strings of length 60 and pattern string with length 1, I see performance improvement of ~30%.With
character set UTF8 collate UNICODE_CII see performance improvement of ~50% in the same test.Test: