Skip to content

fix: ensure accented characters are sorted with their base letters #12284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 4.6.x
Choose a base branch
from

Conversation

vikrantgravitee
Copy link
Contributor

@vikrantgravitee vikrantgravitee commented Jun 24, 2025

Issue

https://gravitee.atlassian.net/browse/APIM-9910

Description

Now, API names are sorted using locale-aware comparison, ensuring correct collation.

Fix:

api_sort.mov

Additional context

@vikrantgravitee vikrantgravitee requested a review from a team as a code owner June 24, 2025 14:34
@vikrantgravitee vikrantgravitee force-pushed the APIM-9910-fix-apis-sorting branch 5 times, most recently from b347d55 to 4f3a8d1 Compare July 1, 2025 07:35
@vikrantgravitee vikrantgravitee force-pushed the APIM-9910-fix-apis-sorting branch 3 times, most recently from 5a0842a to dc34e37 Compare July 3, 2025 08:40
Comment on lines 252 to 257
if (value == null) return new BytesRef("");
collator.setStrength(Collator.SECONDARY);
CollationKey key = collator.getCollationKey(value);
return new BytesRef(key.toByteArray());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to no longer handle special characters.
Can you handle when there are special characters and add a test that makes sure they are handled appropriately?

Comment on lines 258 to 263
private String normalizeForSorting(String value) {
if (value == null) return "";
collator.setStrength(Collator.SECONDARY);
CollationKey key = collator.getCollationKey(value);
return Base64.getEncoder().encodeToString(key.toByteArray());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite similar to the toSortedValue. Can you mutualize the shared functionalities?

@@ -231,7 +237,7 @@ private void appendPath(final Document doc, final int[] pathIndex, final String
doc.add(new StringField(FIELD_PATHS, path, Field.Store.NO));
doc.add(new TextField(FIELD_PATHS_SPLIT, path, Field.Store.NO));
if (pathIndex[0]++ == 0) {
doc.add(new SortedDocValuesField(FIELD_PATHS_SORTED, new BytesRef(QueryParser.escape(path))));
doc.add(new SortedDocValuesField(FIELD_PATHS_SORTED, new BytesRef(normalizeForSorting(path))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueryParser.escape handled special characters. So this would also need to be done in your fix 👍

@Test
void should_sort_names_by_bytesref() throws Exception {
List<String> names = List.of("nano", "Zorro", "Äther", "VEM", "épée", "Épona", "Öko", "BNS");
List<String> expectedSorted = List.of("Äther", "BNS", "épée", "Épona", "nano", "Öko", "VEM", "Zorro");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these should all be lowercase. The name field is sorted by lowercase names in the IndexableApiDocumentTransformer 🤔


@Test
public void shouldSortListCorrectlyWithCollatorAndBytesRef() throws Exception {
List<String> names = List.of("nano", "zorro", "äther", "vem", "foo/bar", "épée", "épona", "öko", "bns-one");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of passing names with capital letters in it to see that the toLowerCase works as expected? (same for IndexableApiDocumentTransformerTest.java tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jourdiw

I updated the test case to show mix behaviour (upper case, lower case and special chars) and the reason is that we don't need specifically lower case because now, collator is being used and it is case-insensitive but it will sort the names in right order now.

@vikrantgravitee vikrantgravitee force-pushed the APIM-9910-fix-apis-sorting branch from eae1d89 to 9895483 Compare July 4, 2025 18:35
@vikrantgravitee vikrantgravitee force-pushed the APIM-9910-fix-apis-sorting branch from 9895483 to 36b201b Compare July 4, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants