-
Notifications
You must be signed in to change notification settings - Fork 75
fix: ensure accented characters are sorted with their base letters #12284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.6.x
Are you sure you want to change the base?
Conversation
b347d55
to
4f3a8d1
Compare
5a0842a
to
dc34e37
Compare
if (value == null) return new BytesRef(""); | ||
collator.setStrength(Collator.SECONDARY); | ||
CollationKey key = collator.getCollationKey(value); | ||
return new BytesRef(key.toByteArray()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to no longer handle special characters.
Can you handle when there are special characters and add a test that makes sure they are handled appropriately?
private String normalizeForSorting(String value) { | ||
if (value == null) return ""; | ||
collator.setStrength(Collator.SECONDARY); | ||
CollationKey key = collator.getCollationKey(value); | ||
return Base64.getEncoder().encodeToString(key.toByteArray()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite similar to the toSortedValue
. Can you mutualize the shared functionalities?
@@ -231,7 +237,7 @@ private void appendPath(final Document doc, final int[] pathIndex, final String | |||
doc.add(new StringField(FIELD_PATHS, path, Field.Store.NO)); | |||
doc.add(new TextField(FIELD_PATHS_SPLIT, path, Field.Store.NO)); | |||
if (pathIndex[0]++ == 0) { | |||
doc.add(new SortedDocValuesField(FIELD_PATHS_SORTED, new BytesRef(QueryParser.escape(path)))); | |||
doc.add(new SortedDocValuesField(FIELD_PATHS_SORTED, new BytesRef(normalizeForSorting(path)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QueryParser.escape
handled special characters. So this would also need to be done in your fix 👍
@Test | ||
void should_sort_names_by_bytesref() throws Exception { | ||
List<String> names = List.of("nano", "Zorro", "Äther", "VEM", "épée", "Épona", "Öko", "BNS"); | ||
List<String> expectedSorted = List.of("Äther", "BNS", "épée", "Épona", "nano", "Öko", "VEM", "Zorro"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these should all be lowercase. The name
field is sorted by lowercase names in the IndexableApiDocumentTransformer
🤔
5de3d20
to
9df8998
Compare
9df8998
to
eae1d89
Compare
|
||
@Test | ||
public void shouldSortListCorrectlyWithCollatorAndBytesRef() throws Exception { | ||
List<String> names = List.of("nano", "zorro", "äther", "vem", "foo/bar", "épée", "épona", "öko", "bns-one"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of passing names with capital letters in it to see that the toLowerCase
works as expected? (same for IndexableApiDocumentTransformerTest.java
tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jourdiw
I updated the test case to show mix behaviour (upper case, lower case and special chars) and the reason is that we don't need specifically lower case because now, collator is being used and it is case-insensitive but it will sort the names in right order now.
eae1d89
to
9895483
Compare
9895483
to
36b201b
Compare
Issue
https://gravitee.atlassian.net/browse/APIM-9910
Description
Now, API names are sorted using locale-aware comparison, ensuring correct collation.
Fix:
api_sort.mov
Additional context