Skip to content

nfa vs dfa mismatch #345

Closed
Closed
@BurntSushi

Description

@BurntSushi
Member

This program does not pass all asserts:

extern crate regex;

use regex::internal::ExecBuilder;

pub const HAYSTACK_BYTES: &'static [u8] = b"\x3D\x86\x3D\x79";

fn main() {
    let pattern = r"=\b";
    let haystack = &HAYSTACK_BYTES;
    let re0 = ExecBuilder::new(pattern).build().unwrap().into_byte_regex();
    let re1 = ExecBuilder::new(pattern).nfa().build().unwrap().into_byte_regex();
    assert_eq!(re0.is_match(haystack), re1.is_match(haystack));
    assert_eq!(re0.find_iter(haystack).collect::<Vec<_>>(),
               re1.find_iter(haystack).collect::<Vec<_>>());
}

Found by @lukaslueg in #321

Activity

BurntSushi

BurntSushi commented on Feb 20, 2017

@BurntSushi
MemberAuthor

Looks like this is probably a discrepancy in how word boundaries and invalid UTF-8 are handled in the matching engines. Notably, the pattern =(?-u:\b) also fails, but the is_match results are inverted.

SeanRBurton

SeanRBurton commented on Dec 10, 2017

@SeanRBurton
Contributor

A similar example:

fn main() {
    let re0 = ExecBuilder::new("$\\B(?m:^)").only_utf8(false).build().unwrap();
    let re1 = ExecBuilder::new("$\\B(?m:^)").only_utf8(false).nfa().build().unwrap();
    let re0 = re0.into_regex();
    let re1 = re1.into_regex();
    assert_eq!(re0.is_match("\n"), re1.is_match("\n"));
}
BurntSushi

BurntSushi commented on May 17, 2022

@BurntSushi
MemberAuthor

This appears to no longer be an issue. I suspect it was fixed by #561.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @BurntSushi@SeanRBurton

        Issue actions

          nfa vs dfa mismatch · Issue #345 · rust-lang/regex