Skip to content

Behavior of \b word anchor when applied to non-word characters #1275

Closed
@learnbyexample

Description

@learnbyexample

What version of ripgrep are you using?

$ rg --version
ripgrep 11.0.1 (rev 1f1cd9b467)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

How did you install ripgrep?

Using https://github.com/BurntSushi/ripgrep/releases/download/11.0.1/ripgrep_11.0.1_amd64.deb

What operating system are you using ripgrep on?

Ubuntu 16.04.1

Describe your question, feature request, or bug.

Behavior of \b word anchor applied on non-word characters is different than what is observed on other regex engines. I found this accidentally, wasn't a usecase I needed. But filing the issue anyway, incase this may be a bug.

If this is a bug, what are the steps to reproduce the behavior?

See example below.

If this is a bug, what is the actual behavior?

$ echo 'I have 12, he has 2!' | rg -o '\b..\b'
I 
12

If this is a bug, what is the expected behavior?

I feel ripgrep should also behave as seen in other regex engines.

$ # GNU grep 3.3
$ # grep -oP '\b..\b' and rg -oP '\b..\b' also produce same result
$ echo 'I have 12, he has 2!' | grep -o '\b..\b'
I 
12
, 
he
 2

$ echo 'I have 12, he has 2!' | perl -lne 'print join "|", /\b..\b/g'
I |12|, |he| 2
$ echo 'I have 12, he has 2!' | ruby -ne 'puts $_.scan(/\b..\b/).join("|")'
I |12|, |he| 2

$ # python3.7
>>> import re
>>> re.findall(r'\b..\b', 'I have 12, he has 2!')
['I ', '12', ', ', 'he', ' 2']

The below image might help in understanding what is happening in all these regex engines. Vertical bar represents word boundary. Note that even though ! is at end of line, it doesn't have boundary after as it is not a word character.

ripgrep_word_boundary

Note

  • grep -o '\<..\>' (also in vim) will be different compared to \b as \b cannot differentiate between start and end word boundary
  • grep -ow '..' is same as rg -ow '..' (perhaps because both do (?<!\w)pattern(?!\w) for -w option)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug.rollupA PR that has been merged with many others in a rollup.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions