Skip to content

GraphemeCursor::is_boundary returns wrong value inside emoji when chunked one codepoint at a time #139

Open
@pfgithub

Description

@pfgithub

The first time, the chunk containing both [man] and [zwj] is passed into provideContext at the same time. This works and returns the expected result. The second time, the chunk containing [zwj] is provided first, and then the chunk containing [man] is provided. This doesn't work and returns 'true' as if there is a boundary in the middle of an emoji.

#[cfg(test)]
mod tests {
    use unicode_segmentation::{GraphemeCursor, GraphemeIncomplete::*};

    const family_emoji: &str = "A\u{1F468}\u{200D}\u{1F469}\u{1F467}B";
    // "A👨‍👩‍👧‍👧B" : [0: A] [1: MAN] [5: Zero Width Joiner] [8: WOMAN] [12: GIRL] [16: B]

    #[test]
    fn passes() {
        let mut cursor = GraphemeCursor::new(8, family_emoji.len(), true);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(8)));
        cursor.provide_context(&family_emoji[1..8], 1);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Ok(false));
    }

    #[test]
    fn fails() {
        let mut cursor = GraphemeCursor::new(8, family_emoji.len(), true);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(8)));
        cursor.provide_context(&family_emoji[5..8], 5);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(5)));
        cursor.provide_context(&family_emoji[1..5], 1);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Ok(false));
    }
}

Potentially related: #118

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions